blob: ac04f99deb74b76234f92e16539be51e45fb06ac [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070025Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
26``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
27``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
28``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
29``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080072 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070073 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000074 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
Berker Peksag89584c92015-06-25 23:38:48 +030079 The *scheme* argument gives the default addressing scheme, to be
80 used only if the URL does not specify one. It should be the same type
81 (text or bytes) as *urlstring*, except that the default value ``''`` is
82 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000083
84 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030085 recognized. Instead, they are parsed as part of the path, parameters
86 or query component, and :attr:`fragment` is set to the empty string in
87 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000088
89 The return value is actually an instance of a subclass of :class:`tuple`. This
90 class has the following additional read-only convenience attributes:
91
92 +------------------+-------+--------------------------+----------------------+
93 | Attribute | Index | Value | Value if not present |
94 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030095 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000096 +------------------+-------+--------------------------+----------------------+
97 | :attr:`netloc` | 1 | Network location part | empty string |
98 +------------------+-------+--------------------------+----------------------+
99 | :attr:`path` | 2 | Hierarchical path | empty string |
100 +------------------+-------+--------------------------+----------------------+
101 | :attr:`params` | 3 | Parameters for last path | empty string |
102 | | | element | |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`query` | 4 | Query component | empty string |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`fragment` | 5 | Fragment identifier | empty string |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`username` | | User name | :const:`None` |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`password` | | Password | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
113 +------------------+-------+--------------------------+----------------------+
114 | :attr:`port` | | Port number as integer, | :const:`None` |
115 | | | if present | |
116 +------------------+-------+--------------------------+----------------------+
117
118 See section :ref:`urlparse-result-object` for more information on the result
119 object.
120
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000121 .. versionchanged:: 3.2
122 Added IPv6 URL parsing capabilities.
123
Georg Brandla79b8dc2012-09-29 08:59:23 +0200124 .. versionchanged:: 3.3
125 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
126 false), in accordance with :rfc:`3986`. Previously, a whitelist of
127 schemes that support fragments existed.
128
Georg Brandl116aa622007-08-15 14:28:22 +0000129
Victor Stinnerac71c542011-01-14 12:52:12 +0000130.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000131
132 Parse a query string given as a string argument (data of type
133 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
134 dictionary. The dictionary keys are the unique query variable names and the
135 values are lists of values for each name.
136
137 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000138 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000139 indicates that blanks should be retained as blank strings. The default false
140 value indicates that blank values are to be ignored and treated as if they were
141 not included.
142
143 The optional argument *strict_parsing* is a flag indicating what to do with
144 parsing errors. If false (the default), errors are silently ignored. If true,
145 errors raise a :exc:`ValueError` exception.
146
Victor Stinnerac71c542011-01-14 12:52:12 +0000147 The optional *encoding* and *errors* parameters specify how to decode
148 percent-encoded sequences into Unicode characters, as accepted by the
149 :meth:`bytes.decode` method.
150
Michael Foord207d2292012-09-28 14:40:44 +0100151 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
152 parameter set to ``True``) to convert such dictionaries into query
153 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000154
Senthil Kumaran29333122011-02-11 11:25:47 +0000155
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000156 .. versionchanged:: 3.2
157 Add *encoding* and *errors* parameters.
158
Facundo Batistac469d4c2008-09-03 22:49:01 +0000159
Victor Stinnerac71c542011-01-14 12:52:12 +0000160.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000161
162 Parse a query string given as a string argument (data of type
163 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
164 name, value pairs.
165
166 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000167 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000168 indicates that blanks should be retained as blank strings. The default false
169 value indicates that blank values are to be ignored and treated as if they were
170 not included.
171
172 The optional argument *strict_parsing* is a flag indicating what to do with
173 parsing errors. If false (the default), errors are silently ignored. If true,
174 errors raise a :exc:`ValueError` exception.
175
Victor Stinnerac71c542011-01-14 12:52:12 +0000176 The optional *encoding* and *errors* parameters specify how to decode
177 percent-encoded sequences into Unicode characters, as accepted by the
178 :meth:`bytes.decode` method.
179
Facundo Batistac469d4c2008-09-03 22:49:01 +0000180 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
181 query strings.
182
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000183 .. versionchanged:: 3.2
184 Add *encoding* and *errors* parameters.
185
Facundo Batistac469d4c2008-09-03 22:49:01 +0000186
Georg Brandl116aa622007-08-15 14:28:22 +0000187.. function:: urlunparse(parts)
188
Georg Brandl0f7ede42008-06-23 11:23:31 +0000189 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
190 argument can be any six-item iterable. This may result in a slightly
191 different, but equivalent URL, if the URL that was parsed originally had
192 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
193 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000194
195
R. David Murrayf5077aa2010-05-25 15:36:46 +0000196.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000197
198 This is similar to :func:`urlparse`, but does not split the params from the URL.
199 This should generally be used instead of :func:`urlparse` if the more recent URL
200 syntax allowing parameters to be applied to each segment of the *path* portion
201 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
202 separate the path segments and parameters. This function returns a 5-tuple:
203 (addressing scheme, network location, path, query, fragment identifier).
204
205 The return value is actually an instance of a subclass of :class:`tuple`. This
206 class has the following additional read-only convenience attributes:
207
208 +------------------+-------+-------------------------+----------------------+
209 | Attribute | Index | Value | Value if not present |
210 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300211 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000212 +------------------+-------+-------------------------+----------------------+
213 | :attr:`netloc` | 1 | Network location part | empty string |
214 +------------------+-------+-------------------------+----------------------+
215 | :attr:`path` | 2 | Hierarchical path | empty string |
216 +------------------+-------+-------------------------+----------------------+
217 | :attr:`query` | 3 | Query component | empty string |
218 +------------------+-------+-------------------------+----------------------+
219 | :attr:`fragment` | 4 | Fragment identifier | empty string |
220 +------------------+-------+-------------------------+----------------------+
221 | :attr:`username` | | User name | :const:`None` |
222 +------------------+-------+-------------------------+----------------------+
223 | :attr:`password` | | Password | :const:`None` |
224 +------------------+-------+-------------------------+----------------------+
225 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
226 +------------------+-------+-------------------------+----------------------+
227 | :attr:`port` | | Port number as integer, | :const:`None` |
228 | | | if present | |
229 +------------------+-------+-------------------------+----------------------+
230
231 See section :ref:`urlparse-result-object` for more information on the result
232 object.
233
Georg Brandl116aa622007-08-15 14:28:22 +0000234
235.. function:: urlunsplit(parts)
236
Georg Brandl0f7ede42008-06-23 11:23:31 +0000237 Combine the elements of a tuple as returned by :func:`urlsplit` into a
238 complete URL as a string. The *parts* argument can be any five-item
239 iterable. This may result in a slightly different, but equivalent URL, if the
240 URL that was parsed originally had unnecessary delimiters (for example, a ?
241 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000242
Georg Brandl116aa622007-08-15 14:28:22 +0000243
Georg Brandl7f01a132009-09-16 15:58:14 +0000244.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000245
246 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
247 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000248 particular the addressing scheme, the network location and (part of) the
249 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000250
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000251 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000252 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
253 'http://www.cwi.nl/%7Eguido/FAQ.html'
254
255 The *allow_fragments* argument has the same meaning and default as for
256 :func:`urlparse`.
257
258 .. note::
259
260 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
261 the *url*'s host name and/or scheme will be present in the result. For example:
262
Christian Heimesfe337bf2008-03-23 21:54:12 +0000263 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000264
265 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
266 ... '//www.python.org/%7Eguido')
267 'http://www.python.org/%7Eguido'
268
269 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
270 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
271
272
273.. function:: urldefrag(url)
274
Georg Brandl0f7ede42008-06-23 11:23:31 +0000275 If *url* contains a fragment identifier, return a modified version of *url*
276 with no fragment identifier, and the fragment identifier as a separate
277 string. If there is no fragment identifier in *url*, return *url* unmodified
278 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000279
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000280 The return value is actually an instance of a subclass of :class:`tuple`. This
281 class has the following additional read-only convenience attributes:
282
283 +------------------+-------+-------------------------+----------------------+
284 | Attribute | Index | Value | Value if not present |
285 +==================+=======+=========================+======================+
286 | :attr:`url` | 0 | URL with no fragment | empty string |
287 +------------------+-------+-------------------------+----------------------+
288 | :attr:`fragment` | 1 | Fragment identifier | empty string |
289 +------------------+-------+-------------------------+----------------------+
290
291 See section :ref:`urlparse-result-object` for more information on the result
292 object.
293
294 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000295 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000296
Georg Brandl009a6bd2011-01-24 19:59:08 +0000297.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000298
299Parsing ASCII Encoded Bytes
300---------------------------
301
302The URL parsing functions were originally designed to operate on character
303strings only. In practice, it is useful to be able to manipulate properly
304quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
305URL parsing functions in this module all operate on :class:`bytes` and
306:class:`bytearray` objects in addition to :class:`str` objects.
307
308If :class:`str` data is passed in, the result will also contain only
309:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
310passed in, the result will contain only :class:`bytes` data.
311
312Attempting to mix :class:`str` data with :class:`bytes` or
313:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000314:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000315byte values will trigger :exc:`UnicodeDecodeError`.
316
317To support easier conversion of result objects between :class:`str` and
318:class:`bytes`, all return values from URL parsing functions provide
319either an :meth:`encode` method (when the result contains :class:`str`
320data) or a :meth:`decode` method (when the result contains :class:`bytes`
321data). The signatures of these methods match those of the corresponding
322:class:`str` and :class:`bytes` methods (except that the default encoding
323is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
324corresponding type that contains either :class:`bytes` data (for
325:meth:`encode` methods) or :class:`str` data (for
326:meth:`decode` methods).
327
328Applications that need to operate on potentially improperly quoted URLs
329that may contain non-ASCII data will need to do their own decoding from
330bytes to characters before invoking the URL parsing methods.
331
332The behaviour described in this section applies only to the URL parsing
333functions. The URL quoting functions use their own rules when producing
334or consuming byte sequences as detailed in the documentation of the
335individual URL quoting functions.
336
337.. versionchanged:: 3.2
338 URL parsing functions now accept ASCII encoded byte sequences
339
340
341.. _urlparse-result-object:
342
343Structured Parse Results
344------------------------
345
346The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000347:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000348These subclasses add the attributes listed in the documentation for
349those functions, the encoding and decoding support described in the
350previous section, as well as an additional method:
351
352.. method:: urllib.parse.SplitResult.geturl()
353
354 Return the re-combined version of the original URL as a string. This may
355 differ from the original URL in that the scheme may be normalized to lower
356 case and empty components may be dropped. Specifically, empty parameters,
357 queries, and fragment identifiers will be removed.
358
359 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
360 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
361 made to the URL returned by this method.
362
363 The result of this method remains unchanged if passed back through the original
364 parsing function:
365
366 >>> from urllib.parse import urlsplit
367 >>> url = 'HTTP://www.Python.org/doc/#'
368 >>> r1 = urlsplit(url)
369 >>> r1.geturl()
370 'http://www.Python.org/doc/'
371 >>> r2 = urlsplit(r1.geturl())
372 >>> r2.geturl()
373 'http://www.Python.org/doc/'
374
375
376The following classes provide the implementations of the structured parse
377results when operating on :class:`str` objects:
378
379.. class:: DefragResult(url, fragment)
380
381 Concrete class for :func:`urldefrag` results containing :class:`str`
382 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
383 instance.
384
385 .. versionadded:: 3.2
386
387.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
388
389 Concrete class for :func:`urlparse` results containing :class:`str`
390 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
391 instance.
392
393.. class:: SplitResult(scheme, netloc, path, query, fragment)
394
395 Concrete class for :func:`urlsplit` results containing :class:`str`
396 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
397 instance.
398
399
400The following classes provide the implementations of the parse results when
401operating on :class:`bytes` or :class:`bytearray` objects:
402
403.. class:: DefragResultBytes(url, fragment)
404
405 Concrete class for :func:`urldefrag` results containing :class:`bytes`
406 data. The :meth:`decode` method returns a :class:`DefragResult`
407 instance.
408
409 .. versionadded:: 3.2
410
411.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
412
413 Concrete class for :func:`urlparse` results containing :class:`bytes`
414 data. The :meth:`decode` method returns a :class:`ParseResult`
415 instance.
416
417 .. versionadded:: 3.2
418
419.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
420
421 Concrete class for :func:`urlsplit` results containing :class:`bytes`
422 data. The :meth:`decode` method returns a :class:`SplitResult`
423 instance.
424
425 .. versionadded:: 3.2
426
427
428URL Quoting
429-----------
430
431The URL quoting functions focus on taking program data and making it safe
432for use as URL components by quoting special characters and appropriately
433encoding non-ASCII text. They also support reversing these operations to
434recreate the original data from the contents of a URL component if that
435task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000436
437.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000438
439 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000440 digits, and the characters ``'_.-'`` are never quoted. By default, this
441 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000442 parameter specifies additional ASCII characters that should not be quoted
443 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000444
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000445 *string* may be either a :class:`str` or a :class:`bytes`.
446
447 The optional *encoding* and *errors* parameters specify how to deal with
448 non-ASCII characters, as accepted by the :meth:`str.encode` method.
449 *encoding* defaults to ``'utf-8'``.
450 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
451 :class:`UnicodeEncodeError`.
452 *encoding* and *errors* must not be supplied if *string* is a
453 :class:`bytes`, or a :class:`TypeError` is raised.
454
455 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
456 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
457
458 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000459
460
Georg Brandl7f01a132009-09-16 15:58:14 +0000461.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000462
Georg Brandl0f7ede42008-06-23 11:23:31 +0000463 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000464 quoting HTML form values when building up a query string to go into a URL.
465 Plus signs in the original string are escaped unless they are included in
466 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000467
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000468 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000469
Georg Brandl7f01a132009-09-16 15:58:14 +0000470
471.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000472
473 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
474 :class:`str`, and does not perform string-to-bytes encoding.
475
476 Example: ``quote_from_bytes(b'a&\xef')`` yields
477 ``'a%26%EF'``.
478
Georg Brandl7f01a132009-09-16 15:58:14 +0000479
480.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000481
482 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000483 The optional *encoding* and *errors* parameters specify how to decode
484 percent-encoded sequences into Unicode characters, as accepted by the
485 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000486
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000487 *string* must be a :class:`str`.
488
489 *encoding* defaults to ``'utf-8'``.
490 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
491 by a placeholder character.
492
493 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000494
495
Georg Brandl7f01a132009-09-16 15:58:14 +0000496.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000497
Georg Brandl0f7ede42008-06-23 11:23:31 +0000498 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000499 unquoting HTML form values.
500
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000501 *string* must be a :class:`str`.
502
503 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
504
Georg Brandl7f01a132009-09-16 15:58:14 +0000505
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000506.. function:: unquote_to_bytes(string)
507
508 Replace ``%xx`` escapes by their single-octet equivalent, and return a
509 :class:`bytes` object.
510
511 *string* may be either a :class:`str` or a :class:`bytes`.
512
513 If it is a :class:`str`, unescaped non-ASCII characters in *string*
514 are encoded into UTF-8 bytes.
515
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000516 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000517
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000518
Senthil Kumarandf022da2010-07-03 17:48:22 +0000519.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000520
Senthil Kumarandf022da2010-07-03 17:48:22 +0000521 Convert a mapping object or a sequence of two-element tuples, which may
Martin Pantercda85a02015-11-24 22:33:18 +0000522 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
523 text string. If the resultant string is to be used as a *data* for POST
524 operation with the :func:`~urllib.request.urlopen` function, then
525 it should be encoded to bytes, otherwise it would result in a
526 :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700527
Senthil Kumarandf022da2010-07-03 17:48:22 +0000528 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
529 characters, where both *key* and *value* are quoted using :func:`quote_plus`
530 above. When a sequence of two-element tuples is used as the *query*
531 argument, the first element of each tuple is a key and the second is a
532 value. The value element in itself can be a sequence and in that case, if
533 the optional parameter *doseq* is evaluates to *True*, individual
534 ``key=value`` pairs separated by ``'&'`` are generated for each element of
535 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000536 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000537
R David Murray8c4e1122014-12-24 21:23:18 -0500538 The *safe*, *encoding*, and *errors* parameters are passed down to
539 :func:`quote_plus` (the *encoding* and *errors* parameters are only passed
540 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000541
542 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
543 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000544
Senthil Kumaran29333122011-02-11 11:25:47 +0000545 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
546 method can be used for generating query string for a URL or data for POST.
547
Senthil Kumarandf022da2010-07-03 17:48:22 +0000548 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000549 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000550
Georg Brandl116aa622007-08-15 14:28:22 +0000551
552.. seealso::
553
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000554 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700555 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000556 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000557 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000558 parsing requirements as commonly observed in major browsers.
559
560 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
561 This specifies the parsing requirements of IPv6 URLs.
562
563 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
564 Document describing the generic syntactic requirements for both Uniform Resource
565 Names (URNs) and Uniform Resource Locators (URLs).
566
567 :rfc:`2368` - The mailto URL scheme.
568 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000569
570 :rfc:`1808` - Relative Uniform Resource Locators
571 This Request For Comments includes the rules for joining an absolute and a
572 relative URL, including a fair number of "Abnormal Examples" which govern the
573 treatment of border cases.
574
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000575 :rfc:`1738` - Uniform Resource Locators (URL)
576 This specifies the formal syntax and semantics of absolute URLs.