blob: f9936288fd42cdcf25afc19997390e1687924f5b [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/urllib/parse.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
9.. index::
10 single: WWW
11 single: World Wide Web
12 single: URL
13 pair: URL; parsing
14 pair: relative; URL
15
Éric Araujo19f9b712011-08-19 00:49:18 +020016--------------
17
Georg Brandl116aa622007-08-15 14:28:22 +000018This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070024Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
Berker Peksagf6767482016-09-16 14:43:58 +030028``wais``, ``ws``, ``wss``.
Georg Brandl116aa622007-08-15 14:28:22 +000029
Nick Coghlan9fc443c2010-11-30 15:48:08 +000030The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000039
R. David Murrayf5077aa2010-05-25 15:36:46 +000040.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000041
Lisa Roach13c1f722019-03-24 14:28:48 -070042 Parse a URL into six components, returning a 6-item :term:`named tuple`. This
43 corresponds to the general structure of a URL:
44 ``scheme://netloc/path;parameters?query#fragment``.
Georg Brandl116aa622007-08-15 14:28:22 +000045 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Marco Buttue65fcde2017-04-27 14:23:34 +020068 .. doctest::
69 :options: +NORMALIZE_WHITESPACE
70
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070071 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000072 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
73 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
74 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080075 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070076 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000077 params='', query='', fragment='')
78 >>> urlparse('help/Python.html')
79 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
80 query='', fragment='')
81
Berker Peksag89584c92015-06-25 23:38:48 +030082 The *scheme* argument gives the default addressing scheme, to be
83 used only if the URL does not specify one. It should be the same type
84 (text or bytes) as *urlstring*, except that the default value ``''`` is
85 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000086
87 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030088 recognized. Instead, they are parsed as part of the path, parameters
89 or query component, and :attr:`fragment` is set to the empty string in
90 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000091
Lisa Roach13c1f722019-03-24 14:28:48 -070092 The return value is a :term:`named tuple`, which means that its items can
93 be accessed by index or as named attributes, which are:
Georg Brandl116aa622007-08-15 14:28:22 +000094
95 +------------------+-------+--------------------------+----------------------+
96 | Attribute | Index | Value | Value if not present |
97 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030098 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000099 +------------------+-------+--------------------------+----------------------+
100 | :attr:`netloc` | 1 | Network location part | empty string |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`path` | 2 | Hierarchical path | empty string |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`params` | 3 | Parameters for last path | empty string |
105 | | | element | |
106 +------------------+-------+--------------------------+----------------------+
107 | :attr:`query` | 4 | Query component | empty string |
108 +------------------+-------+--------------------------+----------------------+
109 | :attr:`fragment` | 5 | Fragment identifier | empty string |
110 +------------------+-------+--------------------------+----------------------+
111 | :attr:`username` | | User name | :const:`None` |
112 +------------------+-------+--------------------------+----------------------+
113 | :attr:`password` | | Password | :const:`None` |
114 +------------------+-------+--------------------------+----------------------+
115 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
116 +------------------+-------+--------------------------+----------------------+
117 | :attr:`port` | | Port number as integer, | :const:`None` |
118 | | | if present | |
119 +------------------+-------+--------------------------+----------------------+
120
Robert Collinsdfa95c92015-08-10 09:53:30 +1200121 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
122 an invalid port is specified in the URL. See section
123 :ref:`urlparse-result-object` for more information on the result object.
Georg Brandl116aa622007-08-15 14:28:22 +0000124
Howie Benefielf6e863d2017-05-15 23:48:16 -0500125 Unmatched square brackets in the :attr:`netloc` attribute will raise a
126 :exc:`ValueError`.
127
Steve Dower16e6f7d2019-03-07 08:02:26 -0800128 Characters in the :attr:`netloc` attribute that decompose under NFKC
129 normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
130 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
131 decomposed before parsing, no error will be raised.
132
Lisa Roach13c1f722019-03-24 14:28:48 -0700133 As is the case with all named tuples, the subclass has a few additional methods
134 and attributes that are particularly useful. One such method is :meth:`_replace`.
135 The :meth:`_replace` method will return a new ParseResult object replacing specified
136 fields with new values.
137
138 .. doctest::
139 :options: +NORMALIZE_WHITESPACE
140
141 >>> from urllib.parse import urlparse
142 >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
143 >>> u
144 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
145 params='', query='', fragment='')
146 >>> u._replace(scheme='http')
147 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
148 params='', query='', fragment='')
149
150
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000151 .. versionchanged:: 3.2
152 Added IPv6 URL parsing capabilities.
153
Georg Brandla79b8dc2012-09-29 08:59:23 +0200154 .. versionchanged:: 3.3
155 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
156 false), in accordance with :rfc:`3986`. Previously, a whitelist of
157 schemes that support fragments existed.
158
Robert Collinsdfa95c92015-08-10 09:53:30 +1200159 .. versionchanged:: 3.6
160 Out-of-range port numbers now raise :exc:`ValueError`, instead of
161 returning :const:`None`.
162
Steve Dower16e6f7d2019-03-07 08:02:26 -0800163 .. versionchanged:: 3.8
164 Characters that affect netloc parsing under NFKC normalization will
165 now raise :exc:`ValueError`.
166
Georg Brandl116aa622007-08-15 14:28:22 +0000167
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500168.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
Facundo Batistac469d4c2008-09-03 22:49:01 +0000169
170 Parse a query string given as a string argument (data of type
171 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
172 dictionary. The dictionary keys are the unique query variable names and the
173 values are lists of values for each name.
174
175 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000176 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000177 indicates that blanks should be retained as blank strings. The default false
178 value indicates that blank values are to be ignored and treated as if they were
179 not included.
180
181 The optional argument *strict_parsing* is a flag indicating what to do with
182 parsing errors. If false (the default), errors are silently ignored. If true,
183 errors raise a :exc:`ValueError` exception.
184
Victor Stinnerac71c542011-01-14 12:52:12 +0000185 The optional *encoding* and *errors* parameters specify how to decode
186 percent-encoded sequences into Unicode characters, as accepted by the
187 :meth:`bytes.decode` method.
188
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500189 The optional argument *max_num_fields* is the maximum number of fields to
190 read. If set, then throws a :exc:`ValueError` if there are more than
191 *max_num_fields* fields read.
192
Michael Foord207d2292012-09-28 14:40:44 +0100193 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
194 parameter set to ``True``) to convert such dictionaries into query
195 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000196
Senthil Kumaran29333122011-02-11 11:25:47 +0000197
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000198 .. versionchanged:: 3.2
199 Add *encoding* and *errors* parameters.
200
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500201 .. versionchanged:: 3.8
202 Added *max_num_fields* parameter.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000203
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500204
205.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
Facundo Batistac469d4c2008-09-03 22:49:01 +0000206
207 Parse a query string given as a string argument (data of type
208 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
209 name, value pairs.
210
211 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000212 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000213 indicates that blanks should be retained as blank strings. The default false
214 value indicates that blank values are to be ignored and treated as if they were
215 not included.
216
217 The optional argument *strict_parsing* is a flag indicating what to do with
218 parsing errors. If false (the default), errors are silently ignored. If true,
219 errors raise a :exc:`ValueError` exception.
220
Victor Stinnerac71c542011-01-14 12:52:12 +0000221 The optional *encoding* and *errors* parameters specify how to decode
222 percent-encoded sequences into Unicode characters, as accepted by the
223 :meth:`bytes.decode` method.
224
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500225 The optional argument *max_num_fields* is the maximum number of fields to
226 read. If set, then throws a :exc:`ValueError` if there are more than
227 *max_num_fields* fields read.
228
Facundo Batistac469d4c2008-09-03 22:49:01 +0000229 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
230 query strings.
231
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000232 .. versionchanged:: 3.2
233 Add *encoding* and *errors* parameters.
234
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500235 .. versionchanged:: 3.8
236 Added *max_num_fields* parameter.
237
Facundo Batistac469d4c2008-09-03 22:49:01 +0000238
Georg Brandl116aa622007-08-15 14:28:22 +0000239.. function:: urlunparse(parts)
240
Georg Brandl0f7ede42008-06-23 11:23:31 +0000241 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
242 argument can be any six-item iterable. This may result in a slightly
243 different, but equivalent URL, if the URL that was parsed originally had
244 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
245 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000246
247
R. David Murrayf5077aa2010-05-25 15:36:46 +0000248.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000249
250 This is similar to :func:`urlparse`, but does not split the params from the URL.
251 This should generally be used instead of :func:`urlparse` if the more recent URL
252 syntax allowing parameters to be applied to each segment of the *path* portion
253 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
Lisa Roach13c1f722019-03-24 14:28:48 -0700254 separate the path segments and parameters. This function returns a 5-item
255 :term:`named tuple`::
Georg Brandl116aa622007-08-15 14:28:22 +0000256
Lisa Roach13c1f722019-03-24 14:28:48 -0700257 (addressing scheme, network location, path, query, fragment identifier).
258
259 The return value is a :term:`named tuple`, its items can be accessed by index
260 or as named attributes:
Georg Brandl116aa622007-08-15 14:28:22 +0000261
262 +------------------+-------+-------------------------+----------------------+
263 | Attribute | Index | Value | Value if not present |
264 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300265 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000266 +------------------+-------+-------------------------+----------------------+
267 | :attr:`netloc` | 1 | Network location part | empty string |
268 +------------------+-------+-------------------------+----------------------+
269 | :attr:`path` | 2 | Hierarchical path | empty string |
270 +------------------+-------+-------------------------+----------------------+
271 | :attr:`query` | 3 | Query component | empty string |
272 +------------------+-------+-------------------------+----------------------+
273 | :attr:`fragment` | 4 | Fragment identifier | empty string |
274 +------------------+-------+-------------------------+----------------------+
275 | :attr:`username` | | User name | :const:`None` |
276 +------------------+-------+-------------------------+----------------------+
277 | :attr:`password` | | Password | :const:`None` |
278 +------------------+-------+-------------------------+----------------------+
279 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
280 +------------------+-------+-------------------------+----------------------+
281 | :attr:`port` | | Port number as integer, | :const:`None` |
282 | | | if present | |
283 +------------------+-------+-------------------------+----------------------+
284
Robert Collinsdfa95c92015-08-10 09:53:30 +1200285 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
286 an invalid port is specified in the URL. See section
287 :ref:`urlparse-result-object` for more information on the result object.
288
Howie Benefielf6e863d2017-05-15 23:48:16 -0500289 Unmatched square brackets in the :attr:`netloc` attribute will raise a
290 :exc:`ValueError`.
291
Steve Dower16e6f7d2019-03-07 08:02:26 -0800292 Characters in the :attr:`netloc` attribute that decompose under NFKC
293 normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
294 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
295 decomposed before parsing, no error will be raised.
296
Robert Collinsdfa95c92015-08-10 09:53:30 +1200297 .. versionchanged:: 3.6
298 Out-of-range port numbers now raise :exc:`ValueError`, instead of
299 returning :const:`None`.
Georg Brandl116aa622007-08-15 14:28:22 +0000300
Steve Dower16e6f7d2019-03-07 08:02:26 -0800301 .. versionchanged:: 3.8
302 Characters that affect netloc parsing under NFKC normalization will
303 now raise :exc:`ValueError`.
304
Georg Brandl116aa622007-08-15 14:28:22 +0000305
306.. function:: urlunsplit(parts)
307
Georg Brandl0f7ede42008-06-23 11:23:31 +0000308 Combine the elements of a tuple as returned by :func:`urlsplit` into a
309 complete URL as a string. The *parts* argument can be any five-item
310 iterable. This may result in a slightly different, but equivalent URL, if the
311 URL that was parsed originally had unnecessary delimiters (for example, a ?
312 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000313
Georg Brandl116aa622007-08-15 14:28:22 +0000314
Georg Brandl7f01a132009-09-16 15:58:14 +0000315.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000316
317 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
318 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000319 particular the addressing scheme, the network location and (part of) the
320 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000321
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000322 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000323 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
324 'http://www.cwi.nl/%7Eguido/FAQ.html'
325
326 The *allow_fragments* argument has the same meaning and default as for
327 :func:`urlparse`.
328
329 .. note::
330
331 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
332 the *url*'s host name and/or scheme will be present in the result. For example:
333
Christian Heimesfe337bf2008-03-23 21:54:12 +0000334 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000335
336 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
337 ... '//www.python.org/%7Eguido')
338 'http://www.python.org/%7Eguido'
339
340 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
341 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
342
343
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400344 .. versionchanged:: 3.5
345
346 Behaviour updated to match the semantics defined in :rfc:`3986`.
347
348
Georg Brandl116aa622007-08-15 14:28:22 +0000349.. function:: urldefrag(url)
350
Georg Brandl0f7ede42008-06-23 11:23:31 +0000351 If *url* contains a fragment identifier, return a modified version of *url*
352 with no fragment identifier, and the fragment identifier as a separate
353 string. If there is no fragment identifier in *url*, return *url* unmodified
354 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000355
Lisa Roach13c1f722019-03-24 14:28:48 -0700356 The return value is a :term:`named tuple`, its items can be accessed by index
357 or as named attributes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000358
359 +------------------+-------+-------------------------+----------------------+
360 | Attribute | Index | Value | Value if not present |
361 +==================+=======+=========================+======================+
362 | :attr:`url` | 0 | URL with no fragment | empty string |
363 +------------------+-------+-------------------------+----------------------+
364 | :attr:`fragment` | 1 | Fragment identifier | empty string |
365 +------------------+-------+-------------------------+----------------------+
366
367 See section :ref:`urlparse-result-object` for more information on the result
368 object.
369
370 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000371 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000372
Georg Brandl009a6bd2011-01-24 19:59:08 +0000373.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000374
375Parsing ASCII Encoded Bytes
376---------------------------
377
378The URL parsing functions were originally designed to operate on character
379strings only. In practice, it is useful to be able to manipulate properly
380quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
381URL parsing functions in this module all operate on :class:`bytes` and
382:class:`bytearray` objects in addition to :class:`str` objects.
383
384If :class:`str` data is passed in, the result will also contain only
385:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
386passed in, the result will contain only :class:`bytes` data.
387
388Attempting to mix :class:`str` data with :class:`bytes` or
389:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000390:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000391byte values will trigger :exc:`UnicodeDecodeError`.
392
393To support easier conversion of result objects between :class:`str` and
394:class:`bytes`, all return values from URL parsing functions provide
395either an :meth:`encode` method (when the result contains :class:`str`
396data) or a :meth:`decode` method (when the result contains :class:`bytes`
397data). The signatures of these methods match those of the corresponding
398:class:`str` and :class:`bytes` methods (except that the default encoding
399is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
400corresponding type that contains either :class:`bytes` data (for
401:meth:`encode` methods) or :class:`str` data (for
402:meth:`decode` methods).
403
404Applications that need to operate on potentially improperly quoted URLs
405that may contain non-ASCII data will need to do their own decoding from
406bytes to characters before invoking the URL parsing methods.
407
408The behaviour described in this section applies only to the URL parsing
409functions. The URL quoting functions use their own rules when producing
410or consuming byte sequences as detailed in the documentation of the
411individual URL quoting functions.
412
413.. versionchanged:: 3.2
414 URL parsing functions now accept ASCII encoded byte sequences
415
416
417.. _urlparse-result-object:
418
419Structured Parse Results
420------------------------
421
422The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000423:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000424These subclasses add the attributes listed in the documentation for
425those functions, the encoding and decoding support described in the
426previous section, as well as an additional method:
427
428.. method:: urllib.parse.SplitResult.geturl()
429
430 Return the re-combined version of the original URL as a string. This may
431 differ from the original URL in that the scheme may be normalized to lower
432 case and empty components may be dropped. Specifically, empty parameters,
433 queries, and fragment identifiers will be removed.
434
435 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
436 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
437 made to the URL returned by this method.
438
439 The result of this method remains unchanged if passed back through the original
440 parsing function:
441
442 >>> from urllib.parse import urlsplit
443 >>> url = 'HTTP://www.Python.org/doc/#'
444 >>> r1 = urlsplit(url)
445 >>> r1.geturl()
446 'http://www.Python.org/doc/'
447 >>> r2 = urlsplit(r1.geturl())
448 >>> r2.geturl()
449 'http://www.Python.org/doc/'
450
451
452The following classes provide the implementations of the structured parse
453results when operating on :class:`str` objects:
454
455.. class:: DefragResult(url, fragment)
456
457 Concrete class for :func:`urldefrag` results containing :class:`str`
458 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
459 instance.
460
461 .. versionadded:: 3.2
462
463.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
464
465 Concrete class for :func:`urlparse` results containing :class:`str`
466 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
467 instance.
468
469.. class:: SplitResult(scheme, netloc, path, query, fragment)
470
471 Concrete class for :func:`urlsplit` results containing :class:`str`
472 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
473 instance.
474
475
476The following classes provide the implementations of the parse results when
477operating on :class:`bytes` or :class:`bytearray` objects:
478
479.. class:: DefragResultBytes(url, fragment)
480
481 Concrete class for :func:`urldefrag` results containing :class:`bytes`
482 data. The :meth:`decode` method returns a :class:`DefragResult`
483 instance.
484
485 .. versionadded:: 3.2
486
487.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
488
489 Concrete class for :func:`urlparse` results containing :class:`bytes`
490 data. The :meth:`decode` method returns a :class:`ParseResult`
491 instance.
492
493 .. versionadded:: 3.2
494
495.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
496
497 Concrete class for :func:`urlsplit` results containing :class:`bytes`
498 data. The :meth:`decode` method returns a :class:`SplitResult`
499 instance.
500
501 .. versionadded:: 3.2
502
503
504URL Quoting
505-----------
506
507The URL quoting functions focus on taking program data and making it safe
508for use as URL components by quoting special characters and appropriately
509encoding non-ASCII text. They also support reversing these operations to
510recreate the original data from the contents of a URL component if that
511task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000512
513.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000514
515 Replace special characters in *string* using the ``%xx`` escape. Letters,
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530516 digits, and the characters ``'_.-~'`` are never quoted. By default, this
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000517 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000518 parameter specifies additional ASCII characters that should not be quoted
519 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000520
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000521 *string* may be either a :class:`str` or a :class:`bytes`.
522
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530523 .. versionchanged:: 3.7
Serhiy Storchaka0a36ac12018-05-31 07:39:00 +0300524 Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530525 included in the set of reserved characters.
526
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000527 The optional *encoding* and *errors* parameters specify how to deal with
528 non-ASCII characters, as accepted by the :meth:`str.encode` method.
529 *encoding* defaults to ``'utf-8'``.
530 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
531 :class:`UnicodeEncodeError`.
532 *encoding* and *errors* must not be supplied if *string* is a
533 :class:`bytes`, or a :class:`TypeError` is raised.
534
535 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
536 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
537
538 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000539
540
Georg Brandl7f01a132009-09-16 15:58:14 +0000541.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000542
Georg Brandl0f7ede42008-06-23 11:23:31 +0000543 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000544 quoting HTML form values when building up a query string to go into a URL.
545 Plus signs in the original string are escaped unless they are included in
546 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000547
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000548 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000549
Georg Brandl7f01a132009-09-16 15:58:14 +0000550
551.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000552
553 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
554 :class:`str`, and does not perform string-to-bytes encoding.
555
556 Example: ``quote_from_bytes(b'a&\xef')`` yields
557 ``'a%26%EF'``.
558
Georg Brandl7f01a132009-09-16 15:58:14 +0000559
560.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000561
562 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000563 The optional *encoding* and *errors* parameters specify how to decode
564 percent-encoded sequences into Unicode characters, as accepted by the
565 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000566
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000567 *string* must be a :class:`str`.
568
569 *encoding* defaults to ``'utf-8'``.
570 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
571 by a placeholder character.
572
573 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000574
575
Georg Brandl7f01a132009-09-16 15:58:14 +0000576.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000577
Georg Brandl0f7ede42008-06-23 11:23:31 +0000578 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000579 unquoting HTML form values.
580
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000581 *string* must be a :class:`str`.
582
583 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
584
Georg Brandl7f01a132009-09-16 15:58:14 +0000585
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000586.. function:: unquote_to_bytes(string)
587
588 Replace ``%xx`` escapes by their single-octet equivalent, and return a
589 :class:`bytes` object.
590
591 *string* may be either a :class:`str` or a :class:`bytes`.
592
593 If it is a :class:`str`, unescaped non-ASCII characters in *string*
594 are encoded into UTF-8 bytes.
595
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000596 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000597
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000598
R David Murrayc17686f2015-05-17 20:44:50 -0400599.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
600 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000601
Senthil Kumarandf022da2010-07-03 17:48:22 +0000602 Convert a mapping object or a sequence of two-element tuples, which may
Martin Pantercda85a02015-11-24 22:33:18 +0000603 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
604 text string. If the resultant string is to be used as a *data* for POST
605 operation with the :func:`~urllib.request.urlopen` function, then
606 it should be encoded to bytes, otherwise it would result in a
607 :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700608
Senthil Kumarandf022da2010-07-03 17:48:22 +0000609 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400610 characters, where both *key* and *value* are quoted using the *quote_via*
611 function. By default, :func:`quote_plus` is used to quote the values, which
612 means spaces are quoted as a ``'+'`` character and '/' characters are
613 encoded as ``%2F``, which follows the standard for GET requests
614 (``application/x-www-form-urlencoded``). An alternate function that can be
615 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
616 and not encode '/' characters. For maximum control of what is quoted, use
617 ``quote`` and specify a value for *safe*.
618
619 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000620 argument, the first element of each tuple is a key and the second is a
621 value. The value element in itself can be a sequence and in that case, if
Serhiy Storchakaa97cd2e2016-10-19 16:43:42 +0300622 the optional parameter *doseq* is evaluates to ``True``, individual
Senthil Kumarandf022da2010-07-03 17:48:22 +0000623 ``key=value`` pairs separated by ``'&'`` are generated for each element of
624 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000625 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000626
R David Murray8c4e1122014-12-24 21:23:18 -0500627 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400628 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500629 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000630
631 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
632 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000633
Senthil Kumaran29333122011-02-11 11:25:47 +0000634 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
635 method can be used for generating query string for a URL or data for POST.
636
Senthil Kumarandf022da2010-07-03 17:48:22 +0000637 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000638 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000639
R David Murrayc17686f2015-05-17 20:44:50 -0400640 .. versionadded:: 3.5
641 *quote_via* parameter.
642
Georg Brandl116aa622007-08-15 14:28:22 +0000643
644.. seealso::
645
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000646 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700647 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000648 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000649 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000650 parsing requirements as commonly observed in major browsers.
651
652 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
653 This specifies the parsing requirements of IPv6 URLs.
654
655 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
656 Document describing the generic syntactic requirements for both Uniform Resource
657 Names (URNs) and Uniform Resource Locators (URLs).
658
659 :rfc:`2368` - The mailto URL scheme.
Martin Panterfe289c02016-05-28 02:20:39 +0000660 Parsing requirements for mailto URL schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000661
662 :rfc:`1808` - Relative Uniform Resource Locators
663 This Request For Comments includes the rules for joining an absolute and a
664 relative URL, including a fair number of "Abnormal Examples" which govern the
665 treatment of border cases.
666
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000667 :rfc:`1738` - Uniform Resource Locators (URL)
668 This specifies the formal syntax and semantics of absolute URLs.