blob: 6754e263878b72ab106163308189574071601f65 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/urllib/parse.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
9.. index::
10 single: WWW
11 single: World Wide Web
12 single: URL
13 pair: URL; parsing
14 pair: relative; URL
15
Éric Araujo19f9b712011-08-19 00:49:18 +020016--------------
17
Georg Brandl116aa622007-08-15 14:28:22 +000018This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070024Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
Berker Peksagf6767482016-09-16 14:43:58 +030028``wais``, ``ws``, ``wss``.
Georg Brandl116aa622007-08-15 14:28:22 +000029
Nick Coghlan9fc443c2010-11-30 15:48:08 +000030The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000039
R. David Murrayf5077aa2010-05-25 15:36:46 +000040.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Parse a URL into six components, returning a 6-tuple. This corresponds to the
43 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
44 Each tuple item is a string, possibly empty. The components are not broken up in
45 smaller parts (for example, the network location is a single string), and %
46 escapes are not expanded. The delimiters as shown above are not part of the
47 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000048 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000049
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000050 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000051 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000052 >>> o # doctest: +NORMALIZE_WHITESPACE
53 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
54 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000055 >>> o.scheme
56 'http'
57 >>> o.port
58 80
59 >>> o.geturl()
60 'http://www.cwi.nl:80/%7Eguido/Python.html'
61
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000062 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
63 a netloc only if it is properly introduced by '//'. Otherwise the
64 input is presumed to be a relative URL and thus to start with
65 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000066
Marco Buttue65fcde2017-04-27 14:23:34 +020067 .. doctest::
68 :options: +NORMALIZE_WHITESPACE
69
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070070 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000071 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
72 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
73 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080074 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070075 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000076 params='', query='', fragment='')
77 >>> urlparse('help/Python.html')
78 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
79 query='', fragment='')
80
Berker Peksag89584c92015-06-25 23:38:48 +030081 The *scheme* argument gives the default addressing scheme, to be
82 used only if the URL does not specify one. It should be the same type
83 (text or bytes) as *urlstring*, except that the default value ``''`` is
84 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000085
86 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030087 recognized. Instead, they are parsed as part of the path, parameters
88 or query component, and :attr:`fragment` is set to the empty string in
89 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000090
91 The return value is actually an instance of a subclass of :class:`tuple`. This
92 class has the following additional read-only convenience attributes:
93
94 +------------------+-------+--------------------------+----------------------+
95 | Attribute | Index | Value | Value if not present |
96 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030097 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000098 +------------------+-------+--------------------------+----------------------+
99 | :attr:`netloc` | 1 | Network location part | empty string |
100 +------------------+-------+--------------------------+----------------------+
101 | :attr:`path` | 2 | Hierarchical path | empty string |
102 +------------------+-------+--------------------------+----------------------+
103 | :attr:`params` | 3 | Parameters for last path | empty string |
104 | | | element | |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`query` | 4 | Query component | empty string |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`fragment` | 5 | Fragment identifier | empty string |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`username` | | User name | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`password` | | Password | :const:`None` |
113 +------------------+-------+--------------------------+----------------------+
114 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
115 +------------------+-------+--------------------------+----------------------+
116 | :attr:`port` | | Port number as integer, | :const:`None` |
117 | | | if present | |
118 +------------------+-------+--------------------------+----------------------+
119
Robert Collinsdfa95c92015-08-10 09:53:30 +1200120 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
121 an invalid port is specified in the URL. See section
122 :ref:`urlparse-result-object` for more information on the result object.
Georg Brandl116aa622007-08-15 14:28:22 +0000123
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000124 .. versionchanged:: 3.2
125 Added IPv6 URL parsing capabilities.
126
Georg Brandla79b8dc2012-09-29 08:59:23 +0200127 .. versionchanged:: 3.3
128 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
129 false), in accordance with :rfc:`3986`. Previously, a whitelist of
130 schemes that support fragments existed.
131
Robert Collinsdfa95c92015-08-10 09:53:30 +1200132 .. versionchanged:: 3.6
133 Out-of-range port numbers now raise :exc:`ValueError`, instead of
134 returning :const:`None`.
135
Georg Brandl116aa622007-08-15 14:28:22 +0000136
Victor Stinnerac71c542011-01-14 12:52:12 +0000137.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000138
139 Parse a query string given as a string argument (data of type
140 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
141 dictionary. The dictionary keys are the unique query variable names and the
142 values are lists of values for each name.
143
144 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000145 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000146 indicates that blanks should be retained as blank strings. The default false
147 value indicates that blank values are to be ignored and treated as if they were
148 not included.
149
150 The optional argument *strict_parsing* is a flag indicating what to do with
151 parsing errors. If false (the default), errors are silently ignored. If true,
152 errors raise a :exc:`ValueError` exception.
153
Victor Stinnerac71c542011-01-14 12:52:12 +0000154 The optional *encoding* and *errors* parameters specify how to decode
155 percent-encoded sequences into Unicode characters, as accepted by the
156 :meth:`bytes.decode` method.
157
Michael Foord207d2292012-09-28 14:40:44 +0100158 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
159 parameter set to ``True``) to convert such dictionaries into query
160 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000161
Senthil Kumaran29333122011-02-11 11:25:47 +0000162
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000163 .. versionchanged:: 3.2
164 Add *encoding* and *errors* parameters.
165
Facundo Batistac469d4c2008-09-03 22:49:01 +0000166
Victor Stinnerac71c542011-01-14 12:52:12 +0000167.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000168
169 Parse a query string given as a string argument (data of type
170 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
171 name, value pairs.
172
173 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000174 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000175 indicates that blanks should be retained as blank strings. The default false
176 value indicates that blank values are to be ignored and treated as if they were
177 not included.
178
179 The optional argument *strict_parsing* is a flag indicating what to do with
180 parsing errors. If false (the default), errors are silently ignored. If true,
181 errors raise a :exc:`ValueError` exception.
182
Victor Stinnerac71c542011-01-14 12:52:12 +0000183 The optional *encoding* and *errors* parameters specify how to decode
184 percent-encoded sequences into Unicode characters, as accepted by the
185 :meth:`bytes.decode` method.
186
Facundo Batistac469d4c2008-09-03 22:49:01 +0000187 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
188 query strings.
189
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000190 .. versionchanged:: 3.2
191 Add *encoding* and *errors* parameters.
192
Facundo Batistac469d4c2008-09-03 22:49:01 +0000193
Georg Brandl116aa622007-08-15 14:28:22 +0000194.. function:: urlunparse(parts)
195
Georg Brandl0f7ede42008-06-23 11:23:31 +0000196 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
197 argument can be any six-item iterable. This may result in a slightly
198 different, but equivalent URL, if the URL that was parsed originally had
199 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
200 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000201
202
R. David Murrayf5077aa2010-05-25 15:36:46 +0000203.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000204
205 This is similar to :func:`urlparse`, but does not split the params from the URL.
206 This should generally be used instead of :func:`urlparse` if the more recent URL
207 syntax allowing parameters to be applied to each segment of the *path* portion
208 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
209 separate the path segments and parameters. This function returns a 5-tuple:
210 (addressing scheme, network location, path, query, fragment identifier).
211
212 The return value is actually an instance of a subclass of :class:`tuple`. This
213 class has the following additional read-only convenience attributes:
214
215 +------------------+-------+-------------------------+----------------------+
216 | Attribute | Index | Value | Value if not present |
217 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300218 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000219 +------------------+-------+-------------------------+----------------------+
220 | :attr:`netloc` | 1 | Network location part | empty string |
221 +------------------+-------+-------------------------+----------------------+
222 | :attr:`path` | 2 | Hierarchical path | empty string |
223 +------------------+-------+-------------------------+----------------------+
224 | :attr:`query` | 3 | Query component | empty string |
225 +------------------+-------+-------------------------+----------------------+
226 | :attr:`fragment` | 4 | Fragment identifier | empty string |
227 +------------------+-------+-------------------------+----------------------+
228 | :attr:`username` | | User name | :const:`None` |
229 +------------------+-------+-------------------------+----------------------+
230 | :attr:`password` | | Password | :const:`None` |
231 +------------------+-------+-------------------------+----------------------+
232 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
233 +------------------+-------+-------------------------+----------------------+
234 | :attr:`port` | | Port number as integer, | :const:`None` |
235 | | | if present | |
236 +------------------+-------+-------------------------+----------------------+
237
Robert Collinsdfa95c92015-08-10 09:53:30 +1200238 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
239 an invalid port is specified in the URL. See section
240 :ref:`urlparse-result-object` for more information on the result object.
241
242 .. versionchanged:: 3.6
243 Out-of-range port numbers now raise :exc:`ValueError`, instead of
244 returning :const:`None`.
Georg Brandl116aa622007-08-15 14:28:22 +0000245
Georg Brandl116aa622007-08-15 14:28:22 +0000246
247.. function:: urlunsplit(parts)
248
Georg Brandl0f7ede42008-06-23 11:23:31 +0000249 Combine the elements of a tuple as returned by :func:`urlsplit` into a
250 complete URL as a string. The *parts* argument can be any five-item
251 iterable. This may result in a slightly different, but equivalent URL, if the
252 URL that was parsed originally had unnecessary delimiters (for example, a ?
253 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000254
Georg Brandl116aa622007-08-15 14:28:22 +0000255
Georg Brandl7f01a132009-09-16 15:58:14 +0000256.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000257
258 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
259 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000260 particular the addressing scheme, the network location and (part of) the
261 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000262
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000263 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000264 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
265 'http://www.cwi.nl/%7Eguido/FAQ.html'
266
267 The *allow_fragments* argument has the same meaning and default as for
268 :func:`urlparse`.
269
270 .. note::
271
272 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
273 the *url*'s host name and/or scheme will be present in the result. For example:
274
Christian Heimesfe337bf2008-03-23 21:54:12 +0000275 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000276
277 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
278 ... '//www.python.org/%7Eguido')
279 'http://www.python.org/%7Eguido'
280
281 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
282 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
283
284
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400285 .. versionchanged:: 3.5
286
287 Behaviour updated to match the semantics defined in :rfc:`3986`.
288
289
Georg Brandl116aa622007-08-15 14:28:22 +0000290.. function:: urldefrag(url)
291
Georg Brandl0f7ede42008-06-23 11:23:31 +0000292 If *url* contains a fragment identifier, return a modified version of *url*
293 with no fragment identifier, and the fragment identifier as a separate
294 string. If there is no fragment identifier in *url*, return *url* unmodified
295 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000296
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000297 The return value is actually an instance of a subclass of :class:`tuple`. This
298 class has the following additional read-only convenience attributes:
299
300 +------------------+-------+-------------------------+----------------------+
301 | Attribute | Index | Value | Value if not present |
302 +==================+=======+=========================+======================+
303 | :attr:`url` | 0 | URL with no fragment | empty string |
304 +------------------+-------+-------------------------+----------------------+
305 | :attr:`fragment` | 1 | Fragment identifier | empty string |
306 +------------------+-------+-------------------------+----------------------+
307
308 See section :ref:`urlparse-result-object` for more information on the result
309 object.
310
311 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000312 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000313
Georg Brandl009a6bd2011-01-24 19:59:08 +0000314.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000315
316Parsing ASCII Encoded Bytes
317---------------------------
318
319The URL parsing functions were originally designed to operate on character
320strings only. In practice, it is useful to be able to manipulate properly
321quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
322URL parsing functions in this module all operate on :class:`bytes` and
323:class:`bytearray` objects in addition to :class:`str` objects.
324
325If :class:`str` data is passed in, the result will also contain only
326:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
327passed in, the result will contain only :class:`bytes` data.
328
329Attempting to mix :class:`str` data with :class:`bytes` or
330:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000331:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000332byte values will trigger :exc:`UnicodeDecodeError`.
333
334To support easier conversion of result objects between :class:`str` and
335:class:`bytes`, all return values from URL parsing functions provide
336either an :meth:`encode` method (when the result contains :class:`str`
337data) or a :meth:`decode` method (when the result contains :class:`bytes`
338data). The signatures of these methods match those of the corresponding
339:class:`str` and :class:`bytes` methods (except that the default encoding
340is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
341corresponding type that contains either :class:`bytes` data (for
342:meth:`encode` methods) or :class:`str` data (for
343:meth:`decode` methods).
344
345Applications that need to operate on potentially improperly quoted URLs
346that may contain non-ASCII data will need to do their own decoding from
347bytes to characters before invoking the URL parsing methods.
348
349The behaviour described in this section applies only to the URL parsing
350functions. The URL quoting functions use their own rules when producing
351or consuming byte sequences as detailed in the documentation of the
352individual URL quoting functions.
353
354.. versionchanged:: 3.2
355 URL parsing functions now accept ASCII encoded byte sequences
356
357
358.. _urlparse-result-object:
359
360Structured Parse Results
361------------------------
362
363The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000364:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000365These subclasses add the attributes listed in the documentation for
366those functions, the encoding and decoding support described in the
367previous section, as well as an additional method:
368
369.. method:: urllib.parse.SplitResult.geturl()
370
371 Return the re-combined version of the original URL as a string. This may
372 differ from the original URL in that the scheme may be normalized to lower
373 case and empty components may be dropped. Specifically, empty parameters,
374 queries, and fragment identifiers will be removed.
375
376 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
377 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
378 made to the URL returned by this method.
379
380 The result of this method remains unchanged if passed back through the original
381 parsing function:
382
383 >>> from urllib.parse import urlsplit
384 >>> url = 'HTTP://www.Python.org/doc/#'
385 >>> r1 = urlsplit(url)
386 >>> r1.geturl()
387 'http://www.Python.org/doc/'
388 >>> r2 = urlsplit(r1.geturl())
389 >>> r2.geturl()
390 'http://www.Python.org/doc/'
391
392
393The following classes provide the implementations of the structured parse
394results when operating on :class:`str` objects:
395
396.. class:: DefragResult(url, fragment)
397
398 Concrete class for :func:`urldefrag` results containing :class:`str`
399 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
400 instance.
401
402 .. versionadded:: 3.2
403
404.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
405
406 Concrete class for :func:`urlparse` results containing :class:`str`
407 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
408 instance.
409
410.. class:: SplitResult(scheme, netloc, path, query, fragment)
411
412 Concrete class for :func:`urlsplit` results containing :class:`str`
413 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
414 instance.
415
416
417The following classes provide the implementations of the parse results when
418operating on :class:`bytes` or :class:`bytearray` objects:
419
420.. class:: DefragResultBytes(url, fragment)
421
422 Concrete class for :func:`urldefrag` results containing :class:`bytes`
423 data. The :meth:`decode` method returns a :class:`DefragResult`
424 instance.
425
426 .. versionadded:: 3.2
427
428.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
429
430 Concrete class for :func:`urlparse` results containing :class:`bytes`
431 data. The :meth:`decode` method returns a :class:`ParseResult`
432 instance.
433
434 .. versionadded:: 3.2
435
436.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
437
438 Concrete class for :func:`urlsplit` results containing :class:`bytes`
439 data. The :meth:`decode` method returns a :class:`SplitResult`
440 instance.
441
442 .. versionadded:: 3.2
443
444
445URL Quoting
446-----------
447
448The URL quoting functions focus on taking program data and making it safe
449for use as URL components by quoting special characters and appropriately
450encoding non-ASCII text. They also support reversing these operations to
451recreate the original data from the contents of a URL component if that
452task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000453
454.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000455
456 Replace special characters in *string* using the ``%xx`` escape. Letters,
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530457 digits, and the characters ``'_.-~'`` are never quoted. By default, this
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000458 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000459 parameter specifies additional ASCII characters that should not be quoted
460 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000461
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000462 *string* may be either a :class:`str` or a :class:`bytes`.
463
Ratnadeep Debnath21024f02017-02-25 14:30:28 +0530464 .. versionchanged:: 3.7
465 Moved from RFC 2396 to RFC 3986 for quoting URL strings. "~" is now
466 included in the set of reserved characters.
467
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000468 The optional *encoding* and *errors* parameters specify how to deal with
469 non-ASCII characters, as accepted by the :meth:`str.encode` method.
470 *encoding* defaults to ``'utf-8'``.
471 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
472 :class:`UnicodeEncodeError`.
473 *encoding* and *errors* must not be supplied if *string* is a
474 :class:`bytes`, or a :class:`TypeError` is raised.
475
476 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
477 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
478
479 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000480
481
Georg Brandl7f01a132009-09-16 15:58:14 +0000482.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000483
Georg Brandl0f7ede42008-06-23 11:23:31 +0000484 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000485 quoting HTML form values when building up a query string to go into a URL.
486 Plus signs in the original string are escaped unless they are included in
487 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000488
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000489 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000490
Georg Brandl7f01a132009-09-16 15:58:14 +0000491
492.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000493
494 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
495 :class:`str`, and does not perform string-to-bytes encoding.
496
497 Example: ``quote_from_bytes(b'a&\xef')`` yields
498 ``'a%26%EF'``.
499
Georg Brandl7f01a132009-09-16 15:58:14 +0000500
501.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000502
503 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000504 The optional *encoding* and *errors* parameters specify how to decode
505 percent-encoded sequences into Unicode characters, as accepted by the
506 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000507
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000508 *string* must be a :class:`str`.
509
510 *encoding* defaults to ``'utf-8'``.
511 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
512 by a placeholder character.
513
514 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000515
516
Georg Brandl7f01a132009-09-16 15:58:14 +0000517.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000518
Georg Brandl0f7ede42008-06-23 11:23:31 +0000519 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000520 unquoting HTML form values.
521
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000522 *string* must be a :class:`str`.
523
524 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
525
Georg Brandl7f01a132009-09-16 15:58:14 +0000526
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000527.. function:: unquote_to_bytes(string)
528
529 Replace ``%xx`` escapes by their single-octet equivalent, and return a
530 :class:`bytes` object.
531
532 *string* may be either a :class:`str` or a :class:`bytes`.
533
534 If it is a :class:`str`, unescaped non-ASCII characters in *string*
535 are encoded into UTF-8 bytes.
536
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000537 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000538
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000539
R David Murrayc17686f2015-05-17 20:44:50 -0400540.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
541 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000542
Senthil Kumarandf022da2010-07-03 17:48:22 +0000543 Convert a mapping object or a sequence of two-element tuples, which may
Martin Pantercda85a02015-11-24 22:33:18 +0000544 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
545 text string. If the resultant string is to be used as a *data* for POST
546 operation with the :func:`~urllib.request.urlopen` function, then
547 it should be encoded to bytes, otherwise it would result in a
548 :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700549
Senthil Kumarandf022da2010-07-03 17:48:22 +0000550 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400551 characters, where both *key* and *value* are quoted using the *quote_via*
552 function. By default, :func:`quote_plus` is used to quote the values, which
553 means spaces are quoted as a ``'+'`` character and '/' characters are
554 encoded as ``%2F``, which follows the standard for GET requests
555 (``application/x-www-form-urlencoded``). An alternate function that can be
556 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
557 and not encode '/' characters. For maximum control of what is quoted, use
558 ``quote`` and specify a value for *safe*.
559
560 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000561 argument, the first element of each tuple is a key and the second is a
562 value. The value element in itself can be a sequence and in that case, if
Serhiy Storchakaa97cd2e2016-10-19 16:43:42 +0300563 the optional parameter *doseq* is evaluates to ``True``, individual
Senthil Kumarandf022da2010-07-03 17:48:22 +0000564 ``key=value`` pairs separated by ``'&'`` are generated for each element of
565 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000566 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000567
R David Murray8c4e1122014-12-24 21:23:18 -0500568 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400569 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500570 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000571
572 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
573 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000574
Senthil Kumaran29333122011-02-11 11:25:47 +0000575 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
576 method can be used for generating query string for a URL or data for POST.
577
Senthil Kumarandf022da2010-07-03 17:48:22 +0000578 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000579 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000580
R David Murrayc17686f2015-05-17 20:44:50 -0400581 .. versionadded:: 3.5
582 *quote_via* parameter.
583
Georg Brandl116aa622007-08-15 14:28:22 +0000584
585.. seealso::
586
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000587 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700588 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000589 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000590 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000591 parsing requirements as commonly observed in major browsers.
592
593 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
594 This specifies the parsing requirements of IPv6 URLs.
595
596 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
597 Document describing the generic syntactic requirements for both Uniform Resource
598 Names (URNs) and Uniform Resource Locators (URLs).
599
600 :rfc:`2368` - The mailto URL scheme.
Martin Panterfe289c02016-05-28 02:20:39 +0000601 Parsing requirements for mailto URL schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000602
603 :rfc:`1808` - Relative Uniform Resource Locators
604 This Request For Comments includes the rules for joining an absolute and a
605 relative URL, including a fair number of "Abnormal Examples" which govern the
606 treatment of border cases.
607
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000608 :rfc:`1738` - Uniform Resource Locators (URL)
609 This specifies the formal syntax and semantics of absolute URLs.