blob: 1a7907823929dc24263d6aaab97fdaabbb1b7efd [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/urllib/parse.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
9.. index::
10 single: WWW
11 single: World Wide Web
12 single: URL
13 pair: URL; parsing
14 pair: relative; URL
15
Éric Araujo19f9b712011-08-19 00:49:18 +020016--------------
17
Georg Brandl116aa622007-08-15 14:28:22 +000018This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070024Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
Berker Peksagf6767482016-09-16 14:43:58 +030028``wais``, ``ws``, ``wss``.
Georg Brandl116aa622007-08-15 14:28:22 +000029
Nick Coghlan9fc443c2010-11-30 15:48:08 +000030The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000039
R. David Murrayf5077aa2010-05-25 15:36:46 +000040.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000041
Lisa Roach13c1f722019-03-24 14:28:48 -070042 Parse a URL into six components, returning a 6-item :term:`named tuple`. This
43 corresponds to the general structure of a URL:
44 ``scheme://netloc/path;parameters?query#fragment``.
Борис Верховский8e1f26e2019-12-31 07:28:18 -050045 Each tuple item is a string, possibly empty. The components are not broken up
46 into smaller parts (for example, the network location is a single string), and %
Georg Brandl116aa622007-08-15 14:28:22 +000047 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Marco Buttue65fcde2017-04-27 14:23:34 +020068 .. doctest::
69 :options: +NORMALIZE_WHITESPACE
70
Christopher Yeh5e5c0f92020-07-16 05:22:32 -060071 >>> from urllib.parse import urlparse
72 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
73 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000074 params='', query='', fragment='')
Christopher Yeh5e5c0f92020-07-16 05:22:32 -060075 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
76 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000077 params='', query='', fragment='')
Christopher Yeh5e5c0f92020-07-16 05:22:32 -060078 >>> urlparse('help/Python.html')
79 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000080 query='', fragment='')
81
Berker Peksag89584c92015-06-25 23:38:48 +030082 The *scheme* argument gives the default addressing scheme, to be
83 used only if the URL does not specify one. It should be the same type
84 (text or bytes) as *urlstring*, except that the default value ``''`` is
85 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000086
87 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030088 recognized. Instead, they are parsed as part of the path, parameters
89 or query component, and :attr:`fragment` is set to the empty string in
90 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000091
Lisa Roach13c1f722019-03-24 14:28:48 -070092 The return value is a :term:`named tuple`, which means that its items can
93 be accessed by index or as named attributes, which are:
Georg Brandl116aa622007-08-15 14:28:22 +000094
95 +------------------+-------+--------------------------+----------------------+
96 | Attribute | Index | Value | Value if not present |
97 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030098 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000099 +------------------+-------+--------------------------+----------------------+
100 | :attr:`netloc` | 1 | Network location part | empty string |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`path` | 2 | Hierarchical path | empty string |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`params` | 3 | Parameters for last path | empty string |
105 | | | element | |
106 +------------------+-------+--------------------------+----------------------+
107 | :attr:`query` | 4 | Query component | empty string |
108 +------------------+-------+--------------------------+----------------------+
109 | :attr:`fragment` | 5 | Fragment identifier | empty string |
110 +------------------+-------+--------------------------+----------------------+
111 | :attr:`username` | | User name | :const:`None` |
112 +------------------+-------+--------------------------+----------------------+
113 | :attr:`password` | | Password | :const:`None` |
114 +------------------+-------+--------------------------+----------------------+
115 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
116 +------------------+-------+--------------------------+----------------------+
117 | :attr:`port` | | Port number as integer, | :const:`None` |
118 | | | if present | |
119 +------------------+-------+--------------------------+----------------------+
120
Robert Collinsdfa95c92015-08-10 09:53:30 +1200121 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
122 an invalid port is specified in the URL. See section
123 :ref:`urlparse-result-object` for more information on the result object.
Georg Brandl116aa622007-08-15 14:28:22 +0000124
Howie Benefielf6e863d2017-05-15 23:48:16 -0500125 Unmatched square brackets in the :attr:`netloc` attribute will raise a
126 :exc:`ValueError`.
127
Steve Dower16e6f7d2019-03-07 08:02:26 -0800128 Characters in the :attr:`netloc` attribute that decompose under NFKC
129 normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
130 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
131 decomposed before parsing, no error will be raised.
132
Lisa Roach13c1f722019-03-24 14:28:48 -0700133 As is the case with all named tuples, the subclass has a few additional methods
134 and attributes that are particularly useful. One such method is :meth:`_replace`.
135 The :meth:`_replace` method will return a new ParseResult object replacing specified
136 fields with new values.
137
138 .. doctest::
139 :options: +NORMALIZE_WHITESPACE
140
Christopher Yeh5e5c0f92020-07-16 05:22:32 -0600141 >>> from urllib.parse import urlparse
142 >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
143 >>> u
144 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
145 params='', query='', fragment='')
146 >>> u._replace(scheme='http')
147 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
148 params='', query='', fragment='')
Lisa Roach13c1f722019-03-24 14:28:48 -0700149
150
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000151 .. versionchanged:: 3.2
152 Added IPv6 URL parsing capabilities.
153
Georg Brandla79b8dc2012-09-29 08:59:23 +0200154 .. versionchanged:: 3.3
155 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
Victor Stinnerfabd7bb2020-08-11 15:26:59 +0200156 false), in accordance with :rfc:`3986`. Previously, an allowlist of
Georg Brandla79b8dc2012-09-29 08:59:23 +0200157 schemes that support fragments existed.
158
Robert Collinsdfa95c92015-08-10 09:53:30 +1200159 .. versionchanged:: 3.6
160 Out-of-range port numbers now raise :exc:`ValueError`, instead of
161 returning :const:`None`.
162
Steve Dower16e6f7d2019-03-07 08:02:26 -0800163 .. versionchanged:: 3.8
164 Characters that affect netloc parsing under NFKC normalization will
165 now raise :exc:`ValueError`.
166
Georg Brandl116aa622007-08-15 14:28:22 +0000167
Adam Goldschmidtfcbe0cb2021-02-15 00:41:57 +0200168.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000169
170 Parse a query string given as a string argument (data of type
171 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
172 dictionary. The dictionary keys are the unique query variable names and the
173 values are lists of values for each name.
174
175 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000176 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000177 indicates that blanks should be retained as blank strings. The default false
178 value indicates that blank values are to be ignored and treated as if they were
179 not included.
180
181 The optional argument *strict_parsing* is a flag indicating what to do with
182 parsing errors. If false (the default), errors are silently ignored. If true,
183 errors raise a :exc:`ValueError` exception.
184
Victor Stinnerac71c542011-01-14 12:52:12 +0000185 The optional *encoding* and *errors* parameters specify how to decode
186 percent-encoded sequences into Unicode characters, as accepted by the
187 :meth:`bytes.decode` method.
188
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500189 The optional argument *max_num_fields* is the maximum number of fields to
190 read. If set, then throws a :exc:`ValueError` if there are more than
191 *max_num_fields* fields read.
192
Adam Goldschmidtfcbe0cb2021-02-15 00:41:57 +0200193 The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to `&`.
194
Michael Foord207d2292012-09-28 14:40:44 +0100195 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
196 parameter set to ``True``) to convert such dictionaries into query
197 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000198
Senthil Kumaran29333122011-02-11 11:25:47 +0000199
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000200 .. versionchanged:: 3.2
201 Add *encoding* and *errors* parameters.
202
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500203 .. versionchanged:: 3.8
204 Added *max_num_fields* parameter.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000205
Adam Goldschmidtfcbe0cb2021-02-15 00:41:57 +0200206 .. versionchanged:: 3.10
207 Added *separator* parameter with the default value of `&`. Python versions earlier than Python 3.10 allowed using both ";" and "&" as
208 query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator.
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500209
Adam Goldschmidtfcbe0cb2021-02-15 00:41:57 +0200210
211.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000212
213 Parse a query string given as a string argument (data of type
214 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
215 name, value pairs.
216
217 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000218 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000219 indicates that blanks should be retained as blank strings. The default false
220 value indicates that blank values are to be ignored and treated as if they were
221 not included.
222
223 The optional argument *strict_parsing* is a flag indicating what to do with
224 parsing errors. If false (the default), errors are silently ignored. If true,
225 errors raise a :exc:`ValueError` exception.
226
Victor Stinnerac71c542011-01-14 12:52:12 +0000227 The optional *encoding* and *errors* parameters specify how to decode
228 percent-encoded sequences into Unicode characters, as accepted by the
229 :meth:`bytes.decode` method.
230
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500231 The optional argument *max_num_fields* is the maximum number of fields to
232 read. If set, then throws a :exc:`ValueError` if there are more than
233 *max_num_fields* fields read.
234
Adam Goldschmidtfcbe0cb2021-02-15 00:41:57 +0200235 The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to `&`.
236
Facundo Batistac469d4c2008-09-03 22:49:01 +0000237 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
238 query strings.
239
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000240 .. versionchanged:: 3.2
241 Add *encoding* and *errors* parameters.
242
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500243 .. versionchanged:: 3.8
244 Added *max_num_fields* parameter.
245
Adam Goldschmidtfcbe0cb2021-02-15 00:41:57 +0200246 .. versionchanged:: 3.10
247 Added *separator* parameter with the default value of `&`. Python versions earlier than Python 3.10 allowed using both ";" and "&" as
248 query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator.
249
Facundo Batistac469d4c2008-09-03 22:49:01 +0000250
Georg Brandl116aa622007-08-15 14:28:22 +0000251.. function:: urlunparse(parts)
252
Georg Brandl0f7ede42008-06-23 11:23:31 +0000253 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
254 argument can be any six-item iterable. This may result in a slightly
255 different, but equivalent URL, if the URL that was parsed originally had
256 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
257 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000258
259
R. David Murrayf5077aa2010-05-25 15:36:46 +0000260.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000261
262 This is similar to :func:`urlparse`, but does not split the params from the URL.
263 This should generally be used instead of :func:`urlparse` if the more recent URL
264 syntax allowing parameters to be applied to each segment of the *path* portion
265 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
Lisa Roach13c1f722019-03-24 14:28:48 -0700266 separate the path segments and parameters. This function returns a 5-item
267 :term:`named tuple`::
Georg Brandl116aa622007-08-15 14:28:22 +0000268
Lisa Roach13c1f722019-03-24 14:28:48 -0700269 (addressing scheme, network location, path, query, fragment identifier).
270
271 The return value is a :term:`named tuple`, its items can be accessed by index
272 or as named attributes:
Georg Brandl116aa622007-08-15 14:28:22 +0000273
274 +------------------+-------+-------------------------+----------------------+
275 | Attribute | Index | Value | Value if not present |
276 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300277 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000278 +------------------+-------+-------------------------+----------------------+
279 | :attr:`netloc` | 1 | Network location part | empty string |
280 +------------------+-------+-------------------------+----------------------+
281 | :attr:`path` | 2 | Hierarchical path | empty string |
282 +------------------+-------+-------------------------+----------------------+
283 | :attr:`query` | 3 | Query component | empty string |
284 +------------------+-------+-------------------------+----------------------+
285 | :attr:`fragment` | 4 | Fragment identifier | empty string |
286 +------------------+-------+-------------------------+----------------------+
287 | :attr:`username` | | User name | :const:`None` |
288 +------------------+-------+-------------------------+----------------------+
289 | :attr:`password` | | Password | :const:`None` |
290 +------------------+-------+-------------------------+----------------------+
291 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
292 +------------------+-------+-------------------------+----------------------+
293 | :attr:`port` | | Port number as integer, | :const:`None` |
294 | | | if present | |
295 +------------------+-------+-------------------------+----------------------+
296
Robert Collinsdfa95c92015-08-10 09:53:30 +1200297 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
298 an invalid port is specified in the URL. See section
299 :ref:`urlparse-result-object` for more information on the result object.
300
Howie Benefielf6e863d2017-05-15 23:48:16 -0500301 Unmatched square brackets in the :attr:`netloc` attribute will raise a
302 :exc:`ValueError`.
303
Steve Dower16e6f7d2019-03-07 08:02:26 -0800304 Characters in the :attr:`netloc` attribute that decompose under NFKC
305 normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
306 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
307 decomposed before parsing, no error will be raised.
308
Robert Collinsdfa95c92015-08-10 09:53:30 +1200309 .. versionchanged:: 3.6
310 Out-of-range port numbers now raise :exc:`ValueError`, instead of
311 returning :const:`None`.
Georg Brandl116aa622007-08-15 14:28:22 +0000312
Steve Dower16e6f7d2019-03-07 08:02:26 -0800313 .. versionchanged:: 3.8
314 Characters that affect netloc parsing under NFKC normalization will
315 now raise :exc:`ValueError`.
316
Georg Brandl116aa622007-08-15 14:28:22 +0000317
318.. function:: urlunsplit(parts)
319
Georg Brandl0f7ede42008-06-23 11:23:31 +0000320 Combine the elements of a tuple as returned by :func:`urlsplit` into a
321 complete URL as a string. The *parts* argument can be any five-item
322 iterable. This may result in a slightly different, but equivalent URL, if the
323 URL that was parsed originally had unnecessary delimiters (for example, a ?
324 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000325
Georg Brandl116aa622007-08-15 14:28:22 +0000326
Georg Brandl7f01a132009-09-16 15:58:14 +0000327.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000328
329 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
330 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000331 particular the addressing scheme, the network location and (part of) the
332 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000333
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000334 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000335 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
336 'http://www.cwi.nl/%7Eguido/FAQ.html'
337
338 The *allow_fragments* argument has the same meaning and default as for
339 :func:`urlparse`.
340
341 .. note::
342
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500343 If *url* is an absolute URL (that is, it starts with ``//`` or ``scheme://``),
344 the *url*'s hostname and/or scheme will be present in the result. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000345
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500346 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000347
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500348 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
349 ... '//www.python.org/%7Eguido')
350 'http://www.python.org/%7Eguido'
Georg Brandl116aa622007-08-15 14:28:22 +0000351
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500352 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
353 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
Georg Brandl116aa622007-08-15 14:28:22 +0000354
355
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400356 .. versionchanged:: 3.5
357
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500358 Behavior updated to match the semantics defined in :rfc:`3986`.
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400359
360
Georg Brandl116aa622007-08-15 14:28:22 +0000361.. function:: urldefrag(url)
362
Georg Brandl0f7ede42008-06-23 11:23:31 +0000363 If *url* contains a fragment identifier, return a modified version of *url*
364 with no fragment identifier, and the fragment identifier as a separate
365 string. If there is no fragment identifier in *url*, return *url* unmodified
366 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000367
Lisa Roach13c1f722019-03-24 14:28:48 -0700368 The return value is a :term:`named tuple`, its items can be accessed by index
369 or as named attributes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000370
371 +------------------+-------+-------------------------+----------------------+
372 | Attribute | Index | Value | Value if not present |
373 +==================+=======+=========================+======================+
374 | :attr:`url` | 0 | URL with no fragment | empty string |
375 +------------------+-------+-------------------------+----------------------+
376 | :attr:`fragment` | 1 | Fragment identifier | empty string |
377 +------------------+-------+-------------------------+----------------------+
378
379 See section :ref:`urlparse-result-object` for more information on the result
380 object.
381
382 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000383 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000384
Rémi Lapeyre674ee122019-05-27 15:43:45 +0200385.. function:: unwrap(url)
386
387 Extract the url from a wrapped URL (that is, a string formatted as
388 ``<URL:scheme://host/path>``, ``<scheme://host/path>``, ``URL:scheme://host/path``
389 or ``scheme://host/path``). If *url* is not a wrapped URL, it is returned
390 without changes.
391
Georg Brandl009a6bd2011-01-24 19:59:08 +0000392.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000393
394Parsing ASCII Encoded Bytes
395---------------------------
396
397The URL parsing functions were originally designed to operate on character
398strings only. In practice, it is useful to be able to manipulate properly
399quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
400URL parsing functions in this module all operate on :class:`bytes` and
401:class:`bytearray` objects in addition to :class:`str` objects.
402
403If :class:`str` data is passed in, the result will also contain only
404:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
405passed in, the result will contain only :class:`bytes` data.
406
407Attempting to mix :class:`str` data with :class:`bytes` or
408:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000409:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000410byte values will trigger :exc:`UnicodeDecodeError`.
411
412To support easier conversion of result objects between :class:`str` and
413:class:`bytes`, all return values from URL parsing functions provide
414either an :meth:`encode` method (when the result contains :class:`str`
415data) or a :meth:`decode` method (when the result contains :class:`bytes`
416data). The signatures of these methods match those of the corresponding
417:class:`str` and :class:`bytes` methods (except that the default encoding
418is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
419corresponding type that contains either :class:`bytes` data (for
420:meth:`encode` methods) or :class:`str` data (for
421:meth:`decode` methods).
422
423Applications that need to operate on potentially improperly quoted URLs
424that may contain non-ASCII data will need to do their own decoding from
425bytes to characters before invoking the URL parsing methods.
426
427The behaviour described in this section applies only to the URL parsing
428functions. The URL quoting functions use their own rules when producing
429or consuming byte sequences as detailed in the documentation of the
430individual URL quoting functions.
431
432.. versionchanged:: 3.2
433 URL parsing functions now accept ASCII encoded byte sequences
434
435
436.. _urlparse-result-object:
437
438Structured Parse Results
439------------------------
440
441The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000442:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000443These subclasses add the attributes listed in the documentation for
444those functions, the encoding and decoding support described in the
445previous section, as well as an additional method:
446
447.. method:: urllib.parse.SplitResult.geturl()
448
449 Return the re-combined version of the original URL as a string. This may
450 differ from the original URL in that the scheme may be normalized to lower
451 case and empty components may be dropped. Specifically, empty parameters,
452 queries, and fragment identifiers will be removed.
453
454 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
455 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
456 made to the URL returned by this method.
457
458 The result of this method remains unchanged if passed back through the original
459 parsing function:
460
461 >>> from urllib.parse import urlsplit
462 >>> url = 'HTTP://www.Python.org/doc/#'
463 >>> r1 = urlsplit(url)
464 >>> r1.geturl()
465 'http://www.Python.org/doc/'
466 >>> r2 = urlsplit(r1.geturl())
467 >>> r2.geturl()
468 'http://www.Python.org/doc/'
469
470
471The following classes provide the implementations of the structured parse
472results when operating on :class:`str` objects:
473
474.. class:: DefragResult(url, fragment)
475
476 Concrete class for :func:`urldefrag` results containing :class:`str`
477 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
478 instance.
479
480 .. versionadded:: 3.2
481
482.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
483
484 Concrete class for :func:`urlparse` results containing :class:`str`
485 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
486 instance.
487
488.. class:: SplitResult(scheme, netloc, path, query, fragment)
489
490 Concrete class for :func:`urlsplit` results containing :class:`str`
491 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
492 instance.
493
494
495The following classes provide the implementations of the parse results when
496operating on :class:`bytes` or :class:`bytearray` objects:
497
498.. class:: DefragResultBytes(url, fragment)
499
500 Concrete class for :func:`urldefrag` results containing :class:`bytes`
501 data. The :meth:`decode` method returns a :class:`DefragResult`
502 instance.
503
504 .. versionadded:: 3.2
505
506.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
507
508 Concrete class for :func:`urlparse` results containing :class:`bytes`
509 data. The :meth:`decode` method returns a :class:`ParseResult`
510 instance.
511
512 .. versionadded:: 3.2
513
514.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
515
516 Concrete class for :func:`urlsplit` results containing :class:`bytes`
517 data. The :meth:`decode` method returns a :class:`SplitResult`
518 instance.
519
520 .. versionadded:: 3.2
521
522
523URL Quoting
524-----------
525
526The URL quoting functions focus on taking program data and making it safe
527for use as URL components by quoting special characters and appropriately
528encoding non-ASCII text. They also support reversing these operations to
529recreate the original data from the contents of a URL component if that
530task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000531
532.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000533
534 Replace special characters in *string* using the ``%xx`` escape. Letters,
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530535 digits, and the characters ``'_.-~'`` are never quoted. By default, this
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500536 function is intended for quoting the path section of a URL. The optional
537 *safe* parameter specifies additional ASCII characters that should not be
538 quoted --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000539
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500540 *string* may be either a :class:`str` or a :class:`bytes` object.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000541
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530542 .. versionchanged:: 3.7
Serhiy Storchaka0a36ac12018-05-31 07:39:00 +0300543 Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
Роман Донченкоf49f6ba2019-11-18 18:30:53 +0300544 included in the set of unreserved characters.
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530545
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000546 The optional *encoding* and *errors* parameters specify how to deal with
547 non-ASCII characters, as accepted by the :meth:`str.encode` method.
548 *encoding* defaults to ``'utf-8'``.
549 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
550 :class:`UnicodeEncodeError`.
551 *encoding* and *errors* must not be supplied if *string* is a
552 :class:`bytes`, or a :class:`TypeError` is raised.
553
554 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
555 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
556
557 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000558
559
Georg Brandl7f01a132009-09-16 15:58:14 +0000560.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000561
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500562 Like :func:`quote`, but also replace spaces with plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000563 quoting HTML form values when building up a query string to go into a URL.
564 Plus signs in the original string are escaped unless they are included in
565 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000566
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000567 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000568
Georg Brandl7f01a132009-09-16 15:58:14 +0000569
570.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000571
572 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
573 :class:`str`, and does not perform string-to-bytes encoding.
574
575 Example: ``quote_from_bytes(b'a&\xef')`` yields
576 ``'a%26%EF'``.
577
Georg Brandl7f01a132009-09-16 15:58:14 +0000578
579.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000580
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500581 Replace ``%xx`` escapes with their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000582 The optional *encoding* and *errors* parameters specify how to decode
583 percent-encoded sequences into Unicode characters, as accepted by the
584 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000585
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500586 *string* may be either a :class:`str` or a :class:`bytes` object.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000587
588 *encoding* defaults to ``'utf-8'``.
589 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
590 by a placeholder character.
591
592 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000593
Stein Karlsenaad2ee02019-10-14 12:36:29 +0200594 .. versionchanged:: 3.9
595 *string* parameter supports bytes and str objects (previously only str).
596
597
598
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000599
Georg Brandl7f01a132009-09-16 15:58:14 +0000600.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000601
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500602 Like :func:`unquote`, but also replace plus signs with spaces, as required
603 for unquoting HTML form values.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000604
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000605 *string* must be a :class:`str`.
606
607 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
608
Georg Brandl7f01a132009-09-16 15:58:14 +0000609
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000610.. function:: unquote_to_bytes(string)
611
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500612 Replace ``%xx`` escapes with their single-octet equivalent, and return a
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000613 :class:`bytes` object.
614
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500615 *string* may be either a :class:`str` or a :class:`bytes` object.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000616
617 If it is a :class:`str`, unescaped non-ASCII characters in *string*
618 are encoded into UTF-8 bytes.
619
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000620 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000621
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000622
R David Murrayc17686f2015-05-17 20:44:50 -0400623.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
624 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000625
Senthil Kumarandf022da2010-07-03 17:48:22 +0000626 Convert a mapping object or a sequence of two-element tuples, which may
Martin Pantercda85a02015-11-24 22:33:18 +0000627 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
628 text string. If the resultant string is to be used as a *data* for POST
629 operation with the :func:`~urllib.request.urlopen` function, then
630 it should be encoded to bytes, otherwise it would result in a
631 :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700632
Senthil Kumarandf022da2010-07-03 17:48:22 +0000633 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400634 characters, where both *key* and *value* are quoted using the *quote_via*
635 function. By default, :func:`quote_plus` is used to quote the values, which
636 means spaces are quoted as a ``'+'`` character and '/' characters are
637 encoded as ``%2F``, which follows the standard for GET requests
638 (``application/x-www-form-urlencoded``). An alternate function that can be
639 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
640 and not encode '/' characters. For maximum control of what is quoted, use
641 ``quote`` and specify a value for *safe*.
642
643 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000644 argument, the first element of each tuple is a key and the second is a
645 value. The value element in itself can be a sequence and in that case, if
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500646 the optional parameter *doseq* evaluates to ``True``, individual
Senthil Kumarandf022da2010-07-03 17:48:22 +0000647 ``key=value`` pairs separated by ``'&'`` are generated for each element of
648 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000649 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000650
R David Murray8c4e1122014-12-24 21:23:18 -0500651 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400652 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500653 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000654
655 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
656 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000657
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500658 Refer to :ref:`urllib examples <urllib-examples>` to find out how the
659 :func:`urllib.parse.urlencode` method can be used for generating the query
660 string of a URL or data for a POST request.
Senthil Kumaran29333122011-02-11 11:25:47 +0000661
Senthil Kumarandf022da2010-07-03 17:48:22 +0000662 .. versionchanged:: 3.2
Борис Верховский8e1f26e2019-12-31 07:28:18 -0500663 *query* supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000664
R David Murrayc17686f2015-05-17 20:44:50 -0400665 .. versionadded:: 3.5
666 *quote_via* parameter.
667
Georg Brandl116aa622007-08-15 14:28:22 +0000668
669.. seealso::
670
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000671 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700672 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000673 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000674 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000675 parsing requirements as commonly observed in major browsers.
676
677 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
678 This specifies the parsing requirements of IPv6 URLs.
679
680 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
681 Document describing the generic syntactic requirements for both Uniform Resource
682 Names (URNs) and Uniform Resource Locators (URLs).
683
684 :rfc:`2368` - The mailto URL scheme.
Martin Panterfe289c02016-05-28 02:20:39 +0000685 Parsing requirements for mailto URL schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000686
687 :rfc:`1808` - Relative Uniform Resource Locators
688 This Request For Comments includes the rules for joining an absolute and a
689 relative URL, including a fair number of "Abnormal Examples" which govern the
690 treatment of border cases.
691
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000692 :rfc:`1738` - Uniform Resource Locators (URL)
693 This specifies the formal syntax and semantics of absolute URLs.