blob: 800f8301c63ef161f141c9c90b137eb19071310e [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070025Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
26``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
27``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
28``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
29``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080072 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070073 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000074 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
R. David Murrayf5077aa2010-05-25 15:36:46 +000079 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000080 scheme, to be used only if the URL does not specify one. The default value for
81 this argument is the empty string.
82
83 If the *allow_fragments* argument is false, fragment identifiers are not
Georg Brandl62b08132014-10-12 16:13:32 +020084 recognized and parsed as part of the preceding component. The default value
85 for this argument is :const:`True`.
Georg Brandl116aa622007-08-15 14:28:22 +000086
87 The return value is actually an instance of a subclass of :class:`tuple`. This
88 class has the following additional read-only convenience attributes:
89
90 +------------------+-------+--------------------------+----------------------+
91 | Attribute | Index | Value | Value if not present |
92 +==================+=======+==========================+======================+
93 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`netloc` | 1 | Network location part | empty string |
96 +------------------+-------+--------------------------+----------------------+
97 | :attr:`path` | 2 | Hierarchical path | empty string |
98 +------------------+-------+--------------------------+----------------------+
99 | :attr:`params` | 3 | Parameters for last path | empty string |
100 | | | element | |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`query` | 4 | Query component | empty string |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`fragment` | 5 | Fragment identifier | empty string |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`username` | | User name | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`password` | | Password | :const:`None` |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`port` | | Port number as integer, | :const:`None` |
113 | | | if present | |
114 +------------------+-------+--------------------------+----------------------+
115
116 See section :ref:`urlparse-result-object` for more information on the result
117 object.
118
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000119 .. versionchanged:: 3.2
120 Added IPv6 URL parsing capabilities.
121
Georg Brandla79b8dc2012-09-29 08:59:23 +0200122 .. versionchanged:: 3.3
123 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
124 false), in accordance with :rfc:`3986`. Previously, a whitelist of
125 schemes that support fragments existed.
126
Georg Brandl116aa622007-08-15 14:28:22 +0000127
Victor Stinnerac71c542011-01-14 12:52:12 +0000128.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000129
130 Parse a query string given as a string argument (data of type
131 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
132 dictionary. The dictionary keys are the unique query variable names and the
133 values are lists of values for each name.
134
135 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000136 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000137 indicates that blanks should be retained as blank strings. The default false
138 value indicates that blank values are to be ignored and treated as if they were
139 not included.
140
141 The optional argument *strict_parsing* is a flag indicating what to do with
142 parsing errors. If false (the default), errors are silently ignored. If true,
143 errors raise a :exc:`ValueError` exception.
144
Victor Stinnerac71c542011-01-14 12:52:12 +0000145 The optional *encoding* and *errors* parameters specify how to decode
146 percent-encoded sequences into Unicode characters, as accepted by the
147 :meth:`bytes.decode` method.
148
Michael Foord207d2292012-09-28 14:40:44 +0100149 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
150 parameter set to ``True``) to convert such dictionaries into query
151 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000152
Senthil Kumaran29333122011-02-11 11:25:47 +0000153
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000154 .. versionchanged:: 3.2
155 Add *encoding* and *errors* parameters.
156
Facundo Batistac469d4c2008-09-03 22:49:01 +0000157
Victor Stinnerac71c542011-01-14 12:52:12 +0000158.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000159
160 Parse a query string given as a string argument (data of type
161 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
162 name, value pairs.
163
164 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000165 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000166 indicates that blanks should be retained as blank strings. The default false
167 value indicates that blank values are to be ignored and treated as if they were
168 not included.
169
170 The optional argument *strict_parsing* is a flag indicating what to do with
171 parsing errors. If false (the default), errors are silently ignored. If true,
172 errors raise a :exc:`ValueError` exception.
173
Victor Stinnerac71c542011-01-14 12:52:12 +0000174 The optional *encoding* and *errors* parameters specify how to decode
175 percent-encoded sequences into Unicode characters, as accepted by the
176 :meth:`bytes.decode` method.
177
Facundo Batistac469d4c2008-09-03 22:49:01 +0000178 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
179 query strings.
180
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000181 .. versionchanged:: 3.2
182 Add *encoding* and *errors* parameters.
183
Facundo Batistac469d4c2008-09-03 22:49:01 +0000184
Georg Brandl116aa622007-08-15 14:28:22 +0000185.. function:: urlunparse(parts)
186
Georg Brandl0f7ede42008-06-23 11:23:31 +0000187 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
188 argument can be any six-item iterable. This may result in a slightly
189 different, but equivalent URL, if the URL that was parsed originally had
190 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
191 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000192
193
R. David Murrayf5077aa2010-05-25 15:36:46 +0000194.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000195
196 This is similar to :func:`urlparse`, but does not split the params from the URL.
197 This should generally be used instead of :func:`urlparse` if the more recent URL
198 syntax allowing parameters to be applied to each segment of the *path* portion
199 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
200 separate the path segments and parameters. This function returns a 5-tuple:
201 (addressing scheme, network location, path, query, fragment identifier).
202
203 The return value is actually an instance of a subclass of :class:`tuple`. This
204 class has the following additional read-only convenience attributes:
205
206 +------------------+-------+-------------------------+----------------------+
207 | Attribute | Index | Value | Value if not present |
208 +==================+=======+=========================+======================+
209 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
210 +------------------+-------+-------------------------+----------------------+
211 | :attr:`netloc` | 1 | Network location part | empty string |
212 +------------------+-------+-------------------------+----------------------+
213 | :attr:`path` | 2 | Hierarchical path | empty string |
214 +------------------+-------+-------------------------+----------------------+
215 | :attr:`query` | 3 | Query component | empty string |
216 +------------------+-------+-------------------------+----------------------+
217 | :attr:`fragment` | 4 | Fragment identifier | empty string |
218 +------------------+-------+-------------------------+----------------------+
219 | :attr:`username` | | User name | :const:`None` |
220 +------------------+-------+-------------------------+----------------------+
221 | :attr:`password` | | Password | :const:`None` |
222 +------------------+-------+-------------------------+----------------------+
223 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
224 +------------------+-------+-------------------------+----------------------+
225 | :attr:`port` | | Port number as integer, | :const:`None` |
226 | | | if present | |
227 +------------------+-------+-------------------------+----------------------+
228
229 See section :ref:`urlparse-result-object` for more information on the result
230 object.
231
Georg Brandl116aa622007-08-15 14:28:22 +0000232
233.. function:: urlunsplit(parts)
234
Georg Brandl0f7ede42008-06-23 11:23:31 +0000235 Combine the elements of a tuple as returned by :func:`urlsplit` into a
236 complete URL as a string. The *parts* argument can be any five-item
237 iterable. This may result in a slightly different, but equivalent URL, if the
238 URL that was parsed originally had unnecessary delimiters (for example, a ?
239 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Georg Brandl116aa622007-08-15 14:28:22 +0000241
Georg Brandl7f01a132009-09-16 15:58:14 +0000242.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000243
244 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
245 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000246 particular the addressing scheme, the network location and (part of) the
247 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000248
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000249 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000250 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
251 'http://www.cwi.nl/%7Eguido/FAQ.html'
252
253 The *allow_fragments* argument has the same meaning and default as for
254 :func:`urlparse`.
255
256 .. note::
257
258 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
259 the *url*'s host name and/or scheme will be present in the result. For example:
260
Christian Heimesfe337bf2008-03-23 21:54:12 +0000261 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000262
263 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
264 ... '//www.python.org/%7Eguido')
265 'http://www.python.org/%7Eguido'
266
267 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
268 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
269
270
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400271 .. versionchanged:: 3.5
272
273 Behaviour updated to match the semantics defined in :rfc:`3986`.
274
275
Georg Brandl116aa622007-08-15 14:28:22 +0000276.. function:: urldefrag(url)
277
Georg Brandl0f7ede42008-06-23 11:23:31 +0000278 If *url* contains a fragment identifier, return a modified version of *url*
279 with no fragment identifier, and the fragment identifier as a separate
280 string. If there is no fragment identifier in *url*, return *url* unmodified
281 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000282
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000283 The return value is actually an instance of a subclass of :class:`tuple`. This
284 class has the following additional read-only convenience attributes:
285
286 +------------------+-------+-------------------------+----------------------+
287 | Attribute | Index | Value | Value if not present |
288 +==================+=======+=========================+======================+
289 | :attr:`url` | 0 | URL with no fragment | empty string |
290 +------------------+-------+-------------------------+----------------------+
291 | :attr:`fragment` | 1 | Fragment identifier | empty string |
292 +------------------+-------+-------------------------+----------------------+
293
294 See section :ref:`urlparse-result-object` for more information on the result
295 object.
296
297 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000298 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000299
Georg Brandl009a6bd2011-01-24 19:59:08 +0000300.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000301
302Parsing ASCII Encoded Bytes
303---------------------------
304
305The URL parsing functions were originally designed to operate on character
306strings only. In practice, it is useful to be able to manipulate properly
307quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
308URL parsing functions in this module all operate on :class:`bytes` and
309:class:`bytearray` objects in addition to :class:`str` objects.
310
311If :class:`str` data is passed in, the result will also contain only
312:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
313passed in, the result will contain only :class:`bytes` data.
314
315Attempting to mix :class:`str` data with :class:`bytes` or
316:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000317:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000318byte values will trigger :exc:`UnicodeDecodeError`.
319
320To support easier conversion of result objects between :class:`str` and
321:class:`bytes`, all return values from URL parsing functions provide
322either an :meth:`encode` method (when the result contains :class:`str`
323data) or a :meth:`decode` method (when the result contains :class:`bytes`
324data). The signatures of these methods match those of the corresponding
325:class:`str` and :class:`bytes` methods (except that the default encoding
326is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
327corresponding type that contains either :class:`bytes` data (for
328:meth:`encode` methods) or :class:`str` data (for
329:meth:`decode` methods).
330
331Applications that need to operate on potentially improperly quoted URLs
332that may contain non-ASCII data will need to do their own decoding from
333bytes to characters before invoking the URL parsing methods.
334
335The behaviour described in this section applies only to the URL parsing
336functions. The URL quoting functions use their own rules when producing
337or consuming byte sequences as detailed in the documentation of the
338individual URL quoting functions.
339
340.. versionchanged:: 3.2
341 URL parsing functions now accept ASCII encoded byte sequences
342
343
344.. _urlparse-result-object:
345
346Structured Parse Results
347------------------------
348
349The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000350:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000351These subclasses add the attributes listed in the documentation for
352those functions, the encoding and decoding support described in the
353previous section, as well as an additional method:
354
355.. method:: urllib.parse.SplitResult.geturl()
356
357 Return the re-combined version of the original URL as a string. This may
358 differ from the original URL in that the scheme may be normalized to lower
359 case and empty components may be dropped. Specifically, empty parameters,
360 queries, and fragment identifiers will be removed.
361
362 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
363 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
364 made to the URL returned by this method.
365
366 The result of this method remains unchanged if passed back through the original
367 parsing function:
368
369 >>> from urllib.parse import urlsplit
370 >>> url = 'HTTP://www.Python.org/doc/#'
371 >>> r1 = urlsplit(url)
372 >>> r1.geturl()
373 'http://www.Python.org/doc/'
374 >>> r2 = urlsplit(r1.geturl())
375 >>> r2.geturl()
376 'http://www.Python.org/doc/'
377
378
379The following classes provide the implementations of the structured parse
380results when operating on :class:`str` objects:
381
382.. class:: DefragResult(url, fragment)
383
384 Concrete class for :func:`urldefrag` results containing :class:`str`
385 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
386 instance.
387
388 .. versionadded:: 3.2
389
390.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
391
392 Concrete class for :func:`urlparse` results containing :class:`str`
393 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
394 instance.
395
396.. class:: SplitResult(scheme, netloc, path, query, fragment)
397
398 Concrete class for :func:`urlsplit` results containing :class:`str`
399 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
400 instance.
401
402
403The following classes provide the implementations of the parse results when
404operating on :class:`bytes` or :class:`bytearray` objects:
405
406.. class:: DefragResultBytes(url, fragment)
407
408 Concrete class for :func:`urldefrag` results containing :class:`bytes`
409 data. The :meth:`decode` method returns a :class:`DefragResult`
410 instance.
411
412 .. versionadded:: 3.2
413
414.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
415
416 Concrete class for :func:`urlparse` results containing :class:`bytes`
417 data. The :meth:`decode` method returns a :class:`ParseResult`
418 instance.
419
420 .. versionadded:: 3.2
421
422.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
423
424 Concrete class for :func:`urlsplit` results containing :class:`bytes`
425 data. The :meth:`decode` method returns a :class:`SplitResult`
426 instance.
427
428 .. versionadded:: 3.2
429
430
431URL Quoting
432-----------
433
434The URL quoting functions focus on taking program data and making it safe
435for use as URL components by quoting special characters and appropriately
436encoding non-ASCII text. They also support reversing these operations to
437recreate the original data from the contents of a URL component if that
438task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000439
440.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000441
442 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000443 digits, and the characters ``'_.-'`` are never quoted. By default, this
444 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000445 parameter specifies additional ASCII characters that should not be quoted
446 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000447
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000448 *string* may be either a :class:`str` or a :class:`bytes`.
449
450 The optional *encoding* and *errors* parameters specify how to deal with
451 non-ASCII characters, as accepted by the :meth:`str.encode` method.
452 *encoding* defaults to ``'utf-8'``.
453 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
454 :class:`UnicodeEncodeError`.
455 *encoding* and *errors* must not be supplied if *string* is a
456 :class:`bytes`, or a :class:`TypeError` is raised.
457
458 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
459 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
460
461 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000462
463
Georg Brandl7f01a132009-09-16 15:58:14 +0000464.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000465
Georg Brandl0f7ede42008-06-23 11:23:31 +0000466 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000467 quoting HTML form values when building up a query string to go into a URL.
468 Plus signs in the original string are escaped unless they are included in
469 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000470
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000471 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000472
Georg Brandl7f01a132009-09-16 15:58:14 +0000473
474.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000475
476 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
477 :class:`str`, and does not perform string-to-bytes encoding.
478
479 Example: ``quote_from_bytes(b'a&\xef')`` yields
480 ``'a%26%EF'``.
481
Georg Brandl7f01a132009-09-16 15:58:14 +0000482
483.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000484
485 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000486 The optional *encoding* and *errors* parameters specify how to decode
487 percent-encoded sequences into Unicode characters, as accepted by the
488 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000489
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000490 *string* must be a :class:`str`.
491
492 *encoding* defaults to ``'utf-8'``.
493 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
494 by a placeholder character.
495
496 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000497
498
Georg Brandl7f01a132009-09-16 15:58:14 +0000499.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000500
Georg Brandl0f7ede42008-06-23 11:23:31 +0000501 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000502 unquoting HTML form values.
503
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000504 *string* must be a :class:`str`.
505
506 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
507
Georg Brandl7f01a132009-09-16 15:58:14 +0000508
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000509.. function:: unquote_to_bytes(string)
510
511 Replace ``%xx`` escapes by their single-octet equivalent, and return a
512 :class:`bytes` object.
513
514 *string* may be either a :class:`str` or a :class:`bytes`.
515
516 If it is a :class:`str`, unescaped non-ASCII characters in *string*
517 are encoded into UTF-8 bytes.
518
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000519 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000520
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000521
R David Murrayc17686f2015-05-17 20:44:50 -0400522.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
523 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000524
Senthil Kumarandf022da2010-07-03 17:48:22 +0000525 Convert a mapping object or a sequence of two-element tuples, which may
R David Murray8c4e1122014-12-24 21:23:18 -0500526 contain :class:`str` or :class:`bytes` objects, to a "percent-encoded"
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700527 string. If the resultant string is to be used as a *data* for POST
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300528 operation with :func:`~urllib.request.urlopen` function, then it should be
529 properly encoded to bytes, otherwise it would result in a :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700530
Senthil Kumarandf022da2010-07-03 17:48:22 +0000531 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400532 characters, where both *key* and *value* are quoted using the *quote_via*
533 function. By default, :func:`quote_plus` is used to quote the values, which
534 means spaces are quoted as a ``'+'`` character and '/' characters are
535 encoded as ``%2F``, which follows the standard for GET requests
536 (``application/x-www-form-urlencoded``). An alternate function that can be
537 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
538 and not encode '/' characters. For maximum control of what is quoted, use
539 ``quote`` and specify a value for *safe*.
540
541 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000542 argument, the first element of each tuple is a key and the second is a
543 value. The value element in itself can be a sequence and in that case, if
544 the optional parameter *doseq* is evaluates to *True*, individual
545 ``key=value`` pairs separated by ``'&'`` are generated for each element of
546 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000547 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000548
R David Murray8c4e1122014-12-24 21:23:18 -0500549 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400550 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500551 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000552
553 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
554 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000555
Senthil Kumaran29333122011-02-11 11:25:47 +0000556 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
557 method can be used for generating query string for a URL or data for POST.
558
Senthil Kumarandf022da2010-07-03 17:48:22 +0000559 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000560 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000561
R David Murrayc17686f2015-05-17 20:44:50 -0400562 .. versionadded:: 3.5
563 *quote_via* parameter.
564
Georg Brandl116aa622007-08-15 14:28:22 +0000565
566.. seealso::
567
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000568 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700569 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000570 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000571 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000572 parsing requirements as commonly observed in major browsers.
573
574 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
575 This specifies the parsing requirements of IPv6 URLs.
576
577 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
578 Document describing the generic syntactic requirements for both Uniform Resource
579 Names (URNs) and Uniform Resource Locators (URLs).
580
581 :rfc:`2368` - The mailto URL scheme.
582 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000583
584 :rfc:`1808` - Relative Uniform Resource Locators
585 This Request For Comments includes the rules for joining an absolute and a
586 relative URL, including a fair number of "Abnormal Examples" which govern the
587 treatment of border cases.
588
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000589 :rfc:`1738` - Uniform Resource Locators (URL)
590 This specifies the formal syntax and semantics of absolute URLs.