blob: af15f5bbfff3a248edf6bc49e4f27d02181d9426 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/urllib/parse.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
9.. index::
10 single: WWW
11 single: World Wide Web
12 single: URL
13 pair: URL; parsing
14 pair: relative; URL
15
Éric Araujo19f9b712011-08-19 00:49:18 +020016--------------
17
Georg Brandl116aa622007-08-15 14:28:22 +000018This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070024Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
Berker Peksagf6767482016-09-16 14:43:58 +030028``wais``, ``ws``, ``wss``.
Georg Brandl116aa622007-08-15 14:28:22 +000029
Nick Coghlan9fc443c2010-11-30 15:48:08 +000030The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000039
R. David Murrayf5077aa2010-05-25 15:36:46 +000040.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Parse a URL into six components, returning a 6-tuple. This corresponds to the
43 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
44 Each tuple item is a string, possibly empty. The components are not broken up in
45 smaller parts (for example, the network location is a single string), and %
46 escapes are not expanded. The delimiters as shown above are not part of the
47 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000048 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000049
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000050 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000051 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000052 >>> o # doctest: +NORMALIZE_WHITESPACE
53 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
54 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000055 >>> o.scheme
56 'http'
57 >>> o.port
58 80
59 >>> o.geturl()
60 'http://www.cwi.nl:80/%7Eguido/Python.html'
61
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000062 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
63 a netloc only if it is properly introduced by '//'. Otherwise the
64 input is presumed to be a relative URL and thus to start with
65 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000066
Marco Buttue65fcde2017-04-27 14:23:34 +020067 .. doctest::
68 :options: +NORMALIZE_WHITESPACE
69
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070070 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000071 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
72 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
73 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080074 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070075 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000076 params='', query='', fragment='')
77 >>> urlparse('help/Python.html')
78 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
79 query='', fragment='')
80
Berker Peksag89584c92015-06-25 23:38:48 +030081 The *scheme* argument gives the default addressing scheme, to be
82 used only if the URL does not specify one. It should be the same type
83 (text or bytes) as *urlstring*, except that the default value ``''`` is
84 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000085
86 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030087 recognized. Instead, they are parsed as part of the path, parameters
88 or query component, and :attr:`fragment` is set to the empty string in
89 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000090
91 The return value is actually an instance of a subclass of :class:`tuple`. This
92 class has the following additional read-only convenience attributes:
93
94 +------------------+-------+--------------------------+----------------------+
95 | Attribute | Index | Value | Value if not present |
96 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030097 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000098 +------------------+-------+--------------------------+----------------------+
99 | :attr:`netloc` | 1 | Network location part | empty string |
100 +------------------+-------+--------------------------+----------------------+
101 | :attr:`path` | 2 | Hierarchical path | empty string |
102 +------------------+-------+--------------------------+----------------------+
103 | :attr:`params` | 3 | Parameters for last path | empty string |
104 | | | element | |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`query` | 4 | Query component | empty string |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`fragment` | 5 | Fragment identifier | empty string |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`username` | | User name | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`password` | | Password | :const:`None` |
113 +------------------+-------+--------------------------+----------------------+
114 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
115 +------------------+-------+--------------------------+----------------------+
116 | :attr:`port` | | Port number as integer, | :const:`None` |
117 | | | if present | |
118 +------------------+-------+--------------------------+----------------------+
119
Robert Collinsdfa95c92015-08-10 09:53:30 +1200120 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
121 an invalid port is specified in the URL. See section
122 :ref:`urlparse-result-object` for more information on the result object.
Georg Brandl116aa622007-08-15 14:28:22 +0000123
Howie Benefielf6e863d2017-05-15 23:48:16 -0500124 Unmatched square brackets in the :attr:`netloc` attribute will raise a
125 :exc:`ValueError`.
126
Steve Dower16e6f7d2019-03-07 08:02:26 -0800127 Characters in the :attr:`netloc` attribute that decompose under NFKC
128 normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
129 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
130 decomposed before parsing, no error will be raised.
131
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000132 .. versionchanged:: 3.2
133 Added IPv6 URL parsing capabilities.
134
Georg Brandla79b8dc2012-09-29 08:59:23 +0200135 .. versionchanged:: 3.3
136 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
137 false), in accordance with :rfc:`3986`. Previously, a whitelist of
138 schemes that support fragments existed.
139
Robert Collinsdfa95c92015-08-10 09:53:30 +1200140 .. versionchanged:: 3.6
141 Out-of-range port numbers now raise :exc:`ValueError`, instead of
142 returning :const:`None`.
143
Steve Dower16e6f7d2019-03-07 08:02:26 -0800144 .. versionchanged:: 3.8
145 Characters that affect netloc parsing under NFKC normalization will
146 now raise :exc:`ValueError`.
147
Georg Brandl116aa622007-08-15 14:28:22 +0000148
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500149.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
Facundo Batistac469d4c2008-09-03 22:49:01 +0000150
151 Parse a query string given as a string argument (data of type
152 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
153 dictionary. The dictionary keys are the unique query variable names and the
154 values are lists of values for each name.
155
156 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000157 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000158 indicates that blanks should be retained as blank strings. The default false
159 value indicates that blank values are to be ignored and treated as if they were
160 not included.
161
162 The optional argument *strict_parsing* is a flag indicating what to do with
163 parsing errors. If false (the default), errors are silently ignored. If true,
164 errors raise a :exc:`ValueError` exception.
165
Victor Stinnerac71c542011-01-14 12:52:12 +0000166 The optional *encoding* and *errors* parameters specify how to decode
167 percent-encoded sequences into Unicode characters, as accepted by the
168 :meth:`bytes.decode` method.
169
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500170 The optional argument *max_num_fields* is the maximum number of fields to
171 read. If set, then throws a :exc:`ValueError` if there are more than
172 *max_num_fields* fields read.
173
Michael Foord207d2292012-09-28 14:40:44 +0100174 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
175 parameter set to ``True``) to convert such dictionaries into query
176 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000177
Senthil Kumaran29333122011-02-11 11:25:47 +0000178
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000179 .. versionchanged:: 3.2
180 Add *encoding* and *errors* parameters.
181
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500182 .. versionchanged:: 3.8
183 Added *max_num_fields* parameter.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000184
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500185
186.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
Facundo Batistac469d4c2008-09-03 22:49:01 +0000187
188 Parse a query string given as a string argument (data of type
189 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
190 name, value pairs.
191
192 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000193 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000194 indicates that blanks should be retained as blank strings. The default false
195 value indicates that blank values are to be ignored and treated as if they were
196 not included.
197
198 The optional argument *strict_parsing* is a flag indicating what to do with
199 parsing errors. If false (the default), errors are silently ignored. If true,
200 errors raise a :exc:`ValueError` exception.
201
Victor Stinnerac71c542011-01-14 12:52:12 +0000202 The optional *encoding* and *errors* parameters specify how to decode
203 percent-encoded sequences into Unicode characters, as accepted by the
204 :meth:`bytes.decode` method.
205
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500206 The optional argument *max_num_fields* is the maximum number of fields to
207 read. If set, then throws a :exc:`ValueError` if there are more than
208 *max_num_fields* fields read.
209
Facundo Batistac469d4c2008-09-03 22:49:01 +0000210 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
211 query strings.
212
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000213 .. versionchanged:: 3.2
214 Add *encoding* and *errors* parameters.
215
matthewbelisle-wf68f32372018-10-30 15:30:19 -0500216 .. versionchanged:: 3.8
217 Added *max_num_fields* parameter.
218
Facundo Batistac469d4c2008-09-03 22:49:01 +0000219
Georg Brandl116aa622007-08-15 14:28:22 +0000220.. function:: urlunparse(parts)
221
Georg Brandl0f7ede42008-06-23 11:23:31 +0000222 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
223 argument can be any six-item iterable. This may result in a slightly
224 different, but equivalent URL, if the URL that was parsed originally had
225 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
226 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000227
228
R. David Murrayf5077aa2010-05-25 15:36:46 +0000229.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000230
231 This is similar to :func:`urlparse`, but does not split the params from the URL.
232 This should generally be used instead of :func:`urlparse` if the more recent URL
233 syntax allowing parameters to be applied to each segment of the *path* portion
234 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
235 separate the path segments and parameters. This function returns a 5-tuple:
236 (addressing scheme, network location, path, query, fragment identifier).
237
238 The return value is actually an instance of a subclass of :class:`tuple`. This
239 class has the following additional read-only convenience attributes:
240
241 +------------------+-------+-------------------------+----------------------+
242 | Attribute | Index | Value | Value if not present |
243 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300244 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000245 +------------------+-------+-------------------------+----------------------+
246 | :attr:`netloc` | 1 | Network location part | empty string |
247 +------------------+-------+-------------------------+----------------------+
248 | :attr:`path` | 2 | Hierarchical path | empty string |
249 +------------------+-------+-------------------------+----------------------+
250 | :attr:`query` | 3 | Query component | empty string |
251 +------------------+-------+-------------------------+----------------------+
252 | :attr:`fragment` | 4 | Fragment identifier | empty string |
253 +------------------+-------+-------------------------+----------------------+
254 | :attr:`username` | | User name | :const:`None` |
255 +------------------+-------+-------------------------+----------------------+
256 | :attr:`password` | | Password | :const:`None` |
257 +------------------+-------+-------------------------+----------------------+
258 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
259 +------------------+-------+-------------------------+----------------------+
260 | :attr:`port` | | Port number as integer, | :const:`None` |
261 | | | if present | |
262 +------------------+-------+-------------------------+----------------------+
263
Robert Collinsdfa95c92015-08-10 09:53:30 +1200264 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
265 an invalid port is specified in the URL. See section
266 :ref:`urlparse-result-object` for more information on the result object.
267
Howie Benefielf6e863d2017-05-15 23:48:16 -0500268 Unmatched square brackets in the :attr:`netloc` attribute will raise a
269 :exc:`ValueError`.
270
Steve Dower16e6f7d2019-03-07 08:02:26 -0800271 Characters in the :attr:`netloc` attribute that decompose under NFKC
272 normalization (as used by the IDNA encoding) into any of ``/``, ``?``,
273 ``#``, ``@``, or ``:`` will raise a :exc:`ValueError`. If the URL is
274 decomposed before parsing, no error will be raised.
275
Robert Collinsdfa95c92015-08-10 09:53:30 +1200276 .. versionchanged:: 3.6
277 Out-of-range port numbers now raise :exc:`ValueError`, instead of
278 returning :const:`None`.
Georg Brandl116aa622007-08-15 14:28:22 +0000279
Steve Dower16e6f7d2019-03-07 08:02:26 -0800280 .. versionchanged:: 3.8
281 Characters that affect netloc parsing under NFKC normalization will
282 now raise :exc:`ValueError`.
283
Georg Brandl116aa622007-08-15 14:28:22 +0000284
285.. function:: urlunsplit(parts)
286
Georg Brandl0f7ede42008-06-23 11:23:31 +0000287 Combine the elements of a tuple as returned by :func:`urlsplit` into a
288 complete URL as a string. The *parts* argument can be any five-item
289 iterable. This may result in a slightly different, but equivalent URL, if the
290 URL that was parsed originally had unnecessary delimiters (for example, a ?
291 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000292
Georg Brandl116aa622007-08-15 14:28:22 +0000293
Georg Brandl7f01a132009-09-16 15:58:14 +0000294.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000295
296 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
297 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000298 particular the addressing scheme, the network location and (part of) the
299 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000300
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000301 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000302 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
303 'http://www.cwi.nl/%7Eguido/FAQ.html'
304
305 The *allow_fragments* argument has the same meaning and default as for
306 :func:`urlparse`.
307
308 .. note::
309
310 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
311 the *url*'s host name and/or scheme will be present in the result. For example:
312
Christian Heimesfe337bf2008-03-23 21:54:12 +0000313 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000314
315 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
316 ... '//www.python.org/%7Eguido')
317 'http://www.python.org/%7Eguido'
318
319 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
320 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
321
322
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400323 .. versionchanged:: 3.5
324
325 Behaviour updated to match the semantics defined in :rfc:`3986`.
326
327
Georg Brandl116aa622007-08-15 14:28:22 +0000328.. function:: urldefrag(url)
329
Georg Brandl0f7ede42008-06-23 11:23:31 +0000330 If *url* contains a fragment identifier, return a modified version of *url*
331 with no fragment identifier, and the fragment identifier as a separate
332 string. If there is no fragment identifier in *url*, return *url* unmodified
333 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000334
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000335 The return value is actually an instance of a subclass of :class:`tuple`. This
336 class has the following additional read-only convenience attributes:
337
338 +------------------+-------+-------------------------+----------------------+
339 | Attribute | Index | Value | Value if not present |
340 +==================+=======+=========================+======================+
341 | :attr:`url` | 0 | URL with no fragment | empty string |
342 +------------------+-------+-------------------------+----------------------+
343 | :attr:`fragment` | 1 | Fragment identifier | empty string |
344 +------------------+-------+-------------------------+----------------------+
345
346 See section :ref:`urlparse-result-object` for more information on the result
347 object.
348
349 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000350 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000351
Georg Brandl009a6bd2011-01-24 19:59:08 +0000352.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000353
354Parsing ASCII Encoded Bytes
355---------------------------
356
357The URL parsing functions were originally designed to operate on character
358strings only. In practice, it is useful to be able to manipulate properly
359quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
360URL parsing functions in this module all operate on :class:`bytes` and
361:class:`bytearray` objects in addition to :class:`str` objects.
362
363If :class:`str` data is passed in, the result will also contain only
364:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
365passed in, the result will contain only :class:`bytes` data.
366
367Attempting to mix :class:`str` data with :class:`bytes` or
368:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000369:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000370byte values will trigger :exc:`UnicodeDecodeError`.
371
372To support easier conversion of result objects between :class:`str` and
373:class:`bytes`, all return values from URL parsing functions provide
374either an :meth:`encode` method (when the result contains :class:`str`
375data) or a :meth:`decode` method (when the result contains :class:`bytes`
376data). The signatures of these methods match those of the corresponding
377:class:`str` and :class:`bytes` methods (except that the default encoding
378is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
379corresponding type that contains either :class:`bytes` data (for
380:meth:`encode` methods) or :class:`str` data (for
381:meth:`decode` methods).
382
383Applications that need to operate on potentially improperly quoted URLs
384that may contain non-ASCII data will need to do their own decoding from
385bytes to characters before invoking the URL parsing methods.
386
387The behaviour described in this section applies only to the URL parsing
388functions. The URL quoting functions use their own rules when producing
389or consuming byte sequences as detailed in the documentation of the
390individual URL quoting functions.
391
392.. versionchanged:: 3.2
393 URL parsing functions now accept ASCII encoded byte sequences
394
395
396.. _urlparse-result-object:
397
398Structured Parse Results
399------------------------
400
401The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000402:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000403These subclasses add the attributes listed in the documentation for
404those functions, the encoding and decoding support described in the
405previous section, as well as an additional method:
406
407.. method:: urllib.parse.SplitResult.geturl()
408
409 Return the re-combined version of the original URL as a string. This may
410 differ from the original URL in that the scheme may be normalized to lower
411 case and empty components may be dropped. Specifically, empty parameters,
412 queries, and fragment identifiers will be removed.
413
414 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
415 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
416 made to the URL returned by this method.
417
418 The result of this method remains unchanged if passed back through the original
419 parsing function:
420
421 >>> from urllib.parse import urlsplit
422 >>> url = 'HTTP://www.Python.org/doc/#'
423 >>> r1 = urlsplit(url)
424 >>> r1.geturl()
425 'http://www.Python.org/doc/'
426 >>> r2 = urlsplit(r1.geturl())
427 >>> r2.geturl()
428 'http://www.Python.org/doc/'
429
430
431The following classes provide the implementations of the structured parse
432results when operating on :class:`str` objects:
433
434.. class:: DefragResult(url, fragment)
435
436 Concrete class for :func:`urldefrag` results containing :class:`str`
437 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
438 instance.
439
440 .. versionadded:: 3.2
441
442.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
443
444 Concrete class for :func:`urlparse` results containing :class:`str`
445 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
446 instance.
447
448.. class:: SplitResult(scheme, netloc, path, query, fragment)
449
450 Concrete class for :func:`urlsplit` results containing :class:`str`
451 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
452 instance.
453
454
455The following classes provide the implementations of the parse results when
456operating on :class:`bytes` or :class:`bytearray` objects:
457
458.. class:: DefragResultBytes(url, fragment)
459
460 Concrete class for :func:`urldefrag` results containing :class:`bytes`
461 data. The :meth:`decode` method returns a :class:`DefragResult`
462 instance.
463
464 .. versionadded:: 3.2
465
466.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
467
468 Concrete class for :func:`urlparse` results containing :class:`bytes`
469 data. The :meth:`decode` method returns a :class:`ParseResult`
470 instance.
471
472 .. versionadded:: 3.2
473
474.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
475
476 Concrete class for :func:`urlsplit` results containing :class:`bytes`
477 data. The :meth:`decode` method returns a :class:`SplitResult`
478 instance.
479
480 .. versionadded:: 3.2
481
482
483URL Quoting
484-----------
485
486The URL quoting functions focus on taking program data and making it safe
487for use as URL components by quoting special characters and appropriately
488encoding non-ASCII text. They also support reversing these operations to
489recreate the original data from the contents of a URL component if that
490task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000491
492.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000493
494 Replace special characters in *string* using the ``%xx`` escape. Letters,
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530495 digits, and the characters ``'_.-~'`` are never quoted. By default, this
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000496 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000497 parameter specifies additional ASCII characters that should not be quoted
498 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000499
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000500 *string* may be either a :class:`str` or a :class:`bytes`.
501
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530502 .. versionchanged:: 3.7
Serhiy Storchaka0a36ac12018-05-31 07:39:00 +0300503 Moved from :rfc:`2396` to :rfc:`3986` for quoting URL strings. "~" is now
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530504 included in the set of reserved characters.
505
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000506 The optional *encoding* and *errors* parameters specify how to deal with
507 non-ASCII characters, as accepted by the :meth:`str.encode` method.
508 *encoding* defaults to ``'utf-8'``.
509 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
510 :class:`UnicodeEncodeError`.
511 *encoding* and *errors* must not be supplied if *string* is a
512 :class:`bytes`, or a :class:`TypeError` is raised.
513
514 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
515 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
516
517 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000518
519
Georg Brandl7f01a132009-09-16 15:58:14 +0000520.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000521
Georg Brandl0f7ede42008-06-23 11:23:31 +0000522 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000523 quoting HTML form values when building up a query string to go into a URL.
524 Plus signs in the original string are escaped unless they are included in
525 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000526
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000527 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000528
Georg Brandl7f01a132009-09-16 15:58:14 +0000529
530.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000531
532 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
533 :class:`str`, and does not perform string-to-bytes encoding.
534
535 Example: ``quote_from_bytes(b'a&\xef')`` yields
536 ``'a%26%EF'``.
537
Georg Brandl7f01a132009-09-16 15:58:14 +0000538
539.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000540
541 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000542 The optional *encoding* and *errors* parameters specify how to decode
543 percent-encoded sequences into Unicode characters, as accepted by the
544 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000545
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000546 *string* must be a :class:`str`.
547
548 *encoding* defaults to ``'utf-8'``.
549 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
550 by a placeholder character.
551
552 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000553
554
Georg Brandl7f01a132009-09-16 15:58:14 +0000555.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000556
Georg Brandl0f7ede42008-06-23 11:23:31 +0000557 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000558 unquoting HTML form values.
559
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000560 *string* must be a :class:`str`.
561
562 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
563
Georg Brandl7f01a132009-09-16 15:58:14 +0000564
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000565.. function:: unquote_to_bytes(string)
566
567 Replace ``%xx`` escapes by their single-octet equivalent, and return a
568 :class:`bytes` object.
569
570 *string* may be either a :class:`str` or a :class:`bytes`.
571
572 If it is a :class:`str`, unescaped non-ASCII characters in *string*
573 are encoded into UTF-8 bytes.
574
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000575 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000576
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000577
R David Murrayc17686f2015-05-17 20:44:50 -0400578.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
579 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000580
Senthil Kumarandf022da2010-07-03 17:48:22 +0000581 Convert a mapping object or a sequence of two-element tuples, which may
Martin Pantercda85a02015-11-24 22:33:18 +0000582 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
583 text string. If the resultant string is to be used as a *data* for POST
584 operation with the :func:`~urllib.request.urlopen` function, then
585 it should be encoded to bytes, otherwise it would result in a
586 :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700587
Senthil Kumarandf022da2010-07-03 17:48:22 +0000588 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400589 characters, where both *key* and *value* are quoted using the *quote_via*
590 function. By default, :func:`quote_plus` is used to quote the values, which
591 means spaces are quoted as a ``'+'`` character and '/' characters are
592 encoded as ``%2F``, which follows the standard for GET requests
593 (``application/x-www-form-urlencoded``). An alternate function that can be
594 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
595 and not encode '/' characters. For maximum control of what is quoted, use
596 ``quote`` and specify a value for *safe*.
597
598 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000599 argument, the first element of each tuple is a key and the second is a
600 value. The value element in itself can be a sequence and in that case, if
Serhiy Storchakaa97cd2e2016-10-19 16:43:42 +0300601 the optional parameter *doseq* is evaluates to ``True``, individual
Senthil Kumarandf022da2010-07-03 17:48:22 +0000602 ``key=value`` pairs separated by ``'&'`` are generated for each element of
603 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000604 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000605
R David Murray8c4e1122014-12-24 21:23:18 -0500606 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400607 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500608 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000609
610 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
611 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000612
Senthil Kumaran29333122011-02-11 11:25:47 +0000613 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
614 method can be used for generating query string for a URL or data for POST.
615
Senthil Kumarandf022da2010-07-03 17:48:22 +0000616 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000617 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000618
R David Murrayc17686f2015-05-17 20:44:50 -0400619 .. versionadded:: 3.5
620 *quote_via* parameter.
621
Georg Brandl116aa622007-08-15 14:28:22 +0000622
623.. seealso::
624
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000625 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700626 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000627 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000628 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000629 parsing requirements as commonly observed in major browsers.
630
631 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
632 This specifies the parsing requirements of IPv6 URLs.
633
634 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
635 Document describing the generic syntactic requirements for both Uniform Resource
636 Names (URNs) and Uniform Resource Locators (URLs).
637
638 :rfc:`2368` - The mailto URL scheme.
Martin Panterfe289c02016-05-28 02:20:39 +0000639 Parsing requirements for mailto URL schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000640
641 :rfc:`1808` - Relative Uniform Resource Locators
642 This Request For Comments includes the rules for joining an absolute and a
643 relative URL, including a fair number of "Abnormal Examples" which govern the
644 treatment of border cases.
645
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000646 :rfc:`1738` - Uniform Resource Locators (URL)
647 This specifies the formal syntax and semantics of absolute URLs.