blob: 676321b46a2232efc4a3e9e5ec7bee36926d4319 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
Terry Jan Reedyfa089b92016-06-11 15:02:54 -04007**Source code:** :source:`Lib/urllib/parse.py`
Georg Brandl116aa622007-08-15 14:28:22 +00008
9.. index::
10 single: WWW
11 single: World Wide Web
12 single: URL
13 pair: URL; parsing
14 pair: relative; URL
15
Éric Araujo19f9b712011-08-19 00:49:18 +020016--------------
17
Georg Brandl116aa622007-08-15 14:28:22 +000018This module defines a standard interface to break Uniform Resource Locator (URL)
19strings up in components (addressing scheme, network location, path etc.), to
20combine the components back into a URL string, and to convert a "relative URL"
21to an absolute URL given a "base URL."
22
23The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070024Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
25``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
26``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
27``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
Berker Peksagf6767482016-09-16 14:43:58 +030028``wais``, ``ws``, ``wss``.
Georg Brandl116aa622007-08-15 14:28:22 +000029
Nick Coghlan9fc443c2010-11-30 15:48:08 +000030The :mod:`urllib.parse` module defines functions that fall into two broad
31categories: URL parsing and URL quoting. These are covered in detail in
32the following sections.
33
34URL Parsing
35-----------
36
37The URL parsing functions focus on splitting a URL string into its components,
38or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000039
R. David Murrayf5077aa2010-05-25 15:36:46 +000040.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000041
42 Parse a URL into six components, returning a 6-tuple. This corresponds to the
43 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
44 Each tuple item is a string, possibly empty. The components are not broken up in
45 smaller parts (for example, the network location is a single string), and %
46 escapes are not expanded. The delimiters as shown above are not part of the
47 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000048 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000049
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000050 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000051 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000052 >>> o # doctest: +NORMALIZE_WHITESPACE
53 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
54 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000055 >>> o.scheme
56 'http'
57 >>> o.port
58 80
59 >>> o.geturl()
60 'http://www.cwi.nl:80/%7Eguido/Python.html'
61
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000062 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
63 a netloc only if it is properly introduced by '//'. Otherwise the
64 input is presumed to be a relative URL and thus to start with
65 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000066
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070067 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000068 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
69 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
70 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080071 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070072 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000073 params='', query='', fragment='')
74 >>> urlparse('help/Python.html')
75 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
76 query='', fragment='')
77
Berker Peksag89584c92015-06-25 23:38:48 +030078 The *scheme* argument gives the default addressing scheme, to be
79 used only if the URL does not specify one. It should be the same type
80 (text or bytes) as *urlstring*, except that the default value ``''`` is
81 always allowed, and is automatically converted to ``b''`` if appropriate.
Georg Brandl116aa622007-08-15 14:28:22 +000082
83 If the *allow_fragments* argument is false, fragment identifiers are not
Berker Peksag89584c92015-06-25 23:38:48 +030084 recognized. Instead, they are parsed as part of the path, parameters
85 or query component, and :attr:`fragment` is set to the empty string in
86 the return value.
Georg Brandl116aa622007-08-15 14:28:22 +000087
88 The return value is actually an instance of a subclass of :class:`tuple`. This
89 class has the following additional read-only convenience attributes:
90
91 +------------------+-------+--------------------------+----------------------+
92 | Attribute | Index | Value | Value if not present |
93 +==================+=======+==========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +030094 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +000095 +------------------+-------+--------------------------+----------------------+
96 | :attr:`netloc` | 1 | Network location part | empty string |
97 +------------------+-------+--------------------------+----------------------+
98 | :attr:`path` | 2 | Hierarchical path | empty string |
99 +------------------+-------+--------------------------+----------------------+
100 | :attr:`params` | 3 | Parameters for last path | empty string |
101 | | | element | |
102 +------------------+-------+--------------------------+----------------------+
103 | :attr:`query` | 4 | Query component | empty string |
104 +------------------+-------+--------------------------+----------------------+
105 | :attr:`fragment` | 5 | Fragment identifier | empty string |
106 +------------------+-------+--------------------------+----------------------+
107 | :attr:`username` | | User name | :const:`None` |
108 +------------------+-------+--------------------------+----------------------+
109 | :attr:`password` | | Password | :const:`None` |
110 +------------------+-------+--------------------------+----------------------+
111 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
112 +------------------+-------+--------------------------+----------------------+
113 | :attr:`port` | | Port number as integer, | :const:`None` |
114 | | | if present | |
115 +------------------+-------+--------------------------+----------------------+
116
Robert Collinsdfa95c92015-08-10 09:53:30 +1200117 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
118 an invalid port is specified in the URL. See section
119 :ref:`urlparse-result-object` for more information on the result object.
Georg Brandl116aa622007-08-15 14:28:22 +0000120
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000121 .. versionchanged:: 3.2
122 Added IPv6 URL parsing capabilities.
123
Georg Brandla79b8dc2012-09-29 08:59:23 +0200124 .. versionchanged:: 3.3
125 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
126 false), in accordance with :rfc:`3986`. Previously, a whitelist of
127 schemes that support fragments existed.
128
Robert Collinsdfa95c92015-08-10 09:53:30 +1200129 .. versionchanged:: 3.6
130 Out-of-range port numbers now raise :exc:`ValueError`, instead of
131 returning :const:`None`.
132
Georg Brandl116aa622007-08-15 14:28:22 +0000133
Victor Stinnerac71c542011-01-14 12:52:12 +0000134.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000135
136 Parse a query string given as a string argument (data of type
137 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
138 dictionary. The dictionary keys are the unique query variable names and the
139 values are lists of values for each name.
140
141 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000142 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000143 indicates that blanks should be retained as blank strings. The default false
144 value indicates that blank values are to be ignored and treated as if they were
145 not included.
146
147 The optional argument *strict_parsing* is a flag indicating what to do with
148 parsing errors. If false (the default), errors are silently ignored. If true,
149 errors raise a :exc:`ValueError` exception.
150
Victor Stinnerac71c542011-01-14 12:52:12 +0000151 The optional *encoding* and *errors* parameters specify how to decode
152 percent-encoded sequences into Unicode characters, as accepted by the
153 :meth:`bytes.decode` method.
154
Michael Foord207d2292012-09-28 14:40:44 +0100155 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
156 parameter set to ``True``) to convert such dictionaries into query
157 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000158
Senthil Kumaran29333122011-02-11 11:25:47 +0000159
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000160 .. versionchanged:: 3.2
161 Add *encoding* and *errors* parameters.
162
Facundo Batistac469d4c2008-09-03 22:49:01 +0000163
Victor Stinnerac71c542011-01-14 12:52:12 +0000164.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000165
166 Parse a query string given as a string argument (data of type
167 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
168 name, value pairs.
169
170 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000171 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000172 indicates that blanks should be retained as blank strings. The default false
173 value indicates that blank values are to be ignored and treated as if they were
174 not included.
175
176 The optional argument *strict_parsing* is a flag indicating what to do with
177 parsing errors. If false (the default), errors are silently ignored. If true,
178 errors raise a :exc:`ValueError` exception.
179
Victor Stinnerac71c542011-01-14 12:52:12 +0000180 The optional *encoding* and *errors* parameters specify how to decode
181 percent-encoded sequences into Unicode characters, as accepted by the
182 :meth:`bytes.decode` method.
183
Facundo Batistac469d4c2008-09-03 22:49:01 +0000184 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
185 query strings.
186
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000187 .. versionchanged:: 3.2
188 Add *encoding* and *errors* parameters.
189
Facundo Batistac469d4c2008-09-03 22:49:01 +0000190
Georg Brandl116aa622007-08-15 14:28:22 +0000191.. function:: urlunparse(parts)
192
Georg Brandl0f7ede42008-06-23 11:23:31 +0000193 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
194 argument can be any six-item iterable. This may result in a slightly
195 different, but equivalent URL, if the URL that was parsed originally had
196 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
197 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000198
199
R. David Murrayf5077aa2010-05-25 15:36:46 +0000200.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000201
202 This is similar to :func:`urlparse`, but does not split the params from the URL.
203 This should generally be used instead of :func:`urlparse` if the more recent URL
204 syntax allowing parameters to be applied to each segment of the *path* portion
205 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
206 separate the path segments and parameters. This function returns a 5-tuple:
207 (addressing scheme, network location, path, query, fragment identifier).
208
209 The return value is actually an instance of a subclass of :class:`tuple`. This
210 class has the following additional read-only convenience attributes:
211
212 +------------------+-------+-------------------------+----------------------+
213 | Attribute | Index | Value | Value if not present |
214 +==================+=======+=========================+======================+
Berker Peksag89584c92015-06-25 23:38:48 +0300215 | :attr:`scheme` | 0 | URL scheme specifier | *scheme* parameter |
Georg Brandl116aa622007-08-15 14:28:22 +0000216 +------------------+-------+-------------------------+----------------------+
217 | :attr:`netloc` | 1 | Network location part | empty string |
218 +------------------+-------+-------------------------+----------------------+
219 | :attr:`path` | 2 | Hierarchical path | empty string |
220 +------------------+-------+-------------------------+----------------------+
221 | :attr:`query` | 3 | Query component | empty string |
222 +------------------+-------+-------------------------+----------------------+
223 | :attr:`fragment` | 4 | Fragment identifier | empty string |
224 +------------------+-------+-------------------------+----------------------+
225 | :attr:`username` | | User name | :const:`None` |
226 +------------------+-------+-------------------------+----------------------+
227 | :attr:`password` | | Password | :const:`None` |
228 +------------------+-------+-------------------------+----------------------+
229 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
230 +------------------+-------+-------------------------+----------------------+
231 | :attr:`port` | | Port number as integer, | :const:`None` |
232 | | | if present | |
233 +------------------+-------+-------------------------+----------------------+
234
Robert Collinsdfa95c92015-08-10 09:53:30 +1200235 Reading the :attr:`port` attribute will raise a :exc:`ValueError` if
236 an invalid port is specified in the URL. See section
237 :ref:`urlparse-result-object` for more information on the result object.
238
239 .. versionchanged:: 3.6
240 Out-of-range port numbers now raise :exc:`ValueError`, instead of
241 returning :const:`None`.
Georg Brandl116aa622007-08-15 14:28:22 +0000242
Georg Brandl116aa622007-08-15 14:28:22 +0000243
244.. function:: urlunsplit(parts)
245
Georg Brandl0f7ede42008-06-23 11:23:31 +0000246 Combine the elements of a tuple as returned by :func:`urlsplit` into a
247 complete URL as a string. The *parts* argument can be any five-item
248 iterable. This may result in a slightly different, but equivalent URL, if the
249 URL that was parsed originally had unnecessary delimiters (for example, a ?
250 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000251
Georg Brandl116aa622007-08-15 14:28:22 +0000252
Georg Brandl7f01a132009-09-16 15:58:14 +0000253.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000254
255 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
256 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000257 particular the addressing scheme, the network location and (part of) the
258 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000259
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000260 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000261 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
262 'http://www.cwi.nl/%7Eguido/FAQ.html'
263
264 The *allow_fragments* argument has the same meaning and default as for
265 :func:`urlparse`.
266
267 .. note::
268
269 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
270 the *url*'s host name and/or scheme will be present in the result. For example:
271
Christian Heimesfe337bf2008-03-23 21:54:12 +0000272 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000273
274 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
275 ... '//www.python.org/%7Eguido')
276 'http://www.python.org/%7Eguido'
277
278 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
279 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
280
281
Antoine Pitrou55ac5b32014-08-21 19:16:17 -0400282 .. versionchanged:: 3.5
283
284 Behaviour updated to match the semantics defined in :rfc:`3986`.
285
286
Georg Brandl116aa622007-08-15 14:28:22 +0000287.. function:: urldefrag(url)
288
Georg Brandl0f7ede42008-06-23 11:23:31 +0000289 If *url* contains a fragment identifier, return a modified version of *url*
290 with no fragment identifier, and the fragment identifier as a separate
291 string. If there is no fragment identifier in *url*, return *url* unmodified
292 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000293
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000294 The return value is actually an instance of a subclass of :class:`tuple`. This
295 class has the following additional read-only convenience attributes:
296
297 +------------------+-------+-------------------------+----------------------+
298 | Attribute | Index | Value | Value if not present |
299 +==================+=======+=========================+======================+
300 | :attr:`url` | 0 | URL with no fragment | empty string |
301 +------------------+-------+-------------------------+----------------------+
302 | :attr:`fragment` | 1 | Fragment identifier | empty string |
303 +------------------+-------+-------------------------+----------------------+
304
305 See section :ref:`urlparse-result-object` for more information on the result
306 object.
307
308 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000309 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000310
Georg Brandl009a6bd2011-01-24 19:59:08 +0000311.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000312
313Parsing ASCII Encoded Bytes
314---------------------------
315
316The URL parsing functions were originally designed to operate on character
317strings only. In practice, it is useful to be able to manipulate properly
318quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
319URL parsing functions in this module all operate on :class:`bytes` and
320:class:`bytearray` objects in addition to :class:`str` objects.
321
322If :class:`str` data is passed in, the result will also contain only
323:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
324passed in, the result will contain only :class:`bytes` data.
325
326Attempting to mix :class:`str` data with :class:`bytes` or
327:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000328:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000329byte values will trigger :exc:`UnicodeDecodeError`.
330
331To support easier conversion of result objects between :class:`str` and
332:class:`bytes`, all return values from URL parsing functions provide
333either an :meth:`encode` method (when the result contains :class:`str`
334data) or a :meth:`decode` method (when the result contains :class:`bytes`
335data). The signatures of these methods match those of the corresponding
336:class:`str` and :class:`bytes` methods (except that the default encoding
337is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
338corresponding type that contains either :class:`bytes` data (for
339:meth:`encode` methods) or :class:`str` data (for
340:meth:`decode` methods).
341
342Applications that need to operate on potentially improperly quoted URLs
343that may contain non-ASCII data will need to do their own decoding from
344bytes to characters before invoking the URL parsing methods.
345
346The behaviour described in this section applies only to the URL parsing
347functions. The URL quoting functions use their own rules when producing
348or consuming byte sequences as detailed in the documentation of the
349individual URL quoting functions.
350
351.. versionchanged:: 3.2
352 URL parsing functions now accept ASCII encoded byte sequences
353
354
355.. _urlparse-result-object:
356
357Structured Parse Results
358------------------------
359
360The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000361:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000362These subclasses add the attributes listed in the documentation for
363those functions, the encoding and decoding support described in the
364previous section, as well as an additional method:
365
366.. method:: urllib.parse.SplitResult.geturl()
367
368 Return the re-combined version of the original URL as a string. This may
369 differ from the original URL in that the scheme may be normalized to lower
370 case and empty components may be dropped. Specifically, empty parameters,
371 queries, and fragment identifiers will be removed.
372
373 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
374 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
375 made to the URL returned by this method.
376
377 The result of this method remains unchanged if passed back through the original
378 parsing function:
379
380 >>> from urllib.parse import urlsplit
381 >>> url = 'HTTP://www.Python.org/doc/#'
382 >>> r1 = urlsplit(url)
383 >>> r1.geturl()
384 'http://www.Python.org/doc/'
385 >>> r2 = urlsplit(r1.geturl())
386 >>> r2.geturl()
387 'http://www.Python.org/doc/'
388
389
390The following classes provide the implementations of the structured parse
391results when operating on :class:`str` objects:
392
393.. class:: DefragResult(url, fragment)
394
395 Concrete class for :func:`urldefrag` results containing :class:`str`
396 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
397 instance.
398
399 .. versionadded:: 3.2
400
401.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
402
403 Concrete class for :func:`urlparse` results containing :class:`str`
404 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
405 instance.
406
407.. class:: SplitResult(scheme, netloc, path, query, fragment)
408
409 Concrete class for :func:`urlsplit` results containing :class:`str`
410 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
411 instance.
412
413
414The following classes provide the implementations of the parse results when
415operating on :class:`bytes` or :class:`bytearray` objects:
416
417.. class:: DefragResultBytes(url, fragment)
418
419 Concrete class for :func:`urldefrag` results containing :class:`bytes`
420 data. The :meth:`decode` method returns a :class:`DefragResult`
421 instance.
422
423 .. versionadded:: 3.2
424
425.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
426
427 Concrete class for :func:`urlparse` results containing :class:`bytes`
428 data. The :meth:`decode` method returns a :class:`ParseResult`
429 instance.
430
431 .. versionadded:: 3.2
432
433.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
434
435 Concrete class for :func:`urlsplit` results containing :class:`bytes`
436 data. The :meth:`decode` method returns a :class:`SplitResult`
437 instance.
438
439 .. versionadded:: 3.2
440
441
442URL Quoting
443-----------
444
445The URL quoting functions focus on taking program data and making it safe
446for use as URL components by quoting special characters and appropriately
447encoding non-ASCII text. They also support reversing these operations to
448recreate the original data from the contents of a URL component if that
449task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000450
451.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000452
453 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000454 digits, and the characters ``'_.-'`` are never quoted. By default, this
455 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000456 parameter specifies additional ASCII characters that should not be quoted
457 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000458
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000459 *string* may be either a :class:`str` or a :class:`bytes`.
460
461 The optional *encoding* and *errors* parameters specify how to deal with
462 non-ASCII characters, as accepted by the :meth:`str.encode` method.
463 *encoding* defaults to ``'utf-8'``.
464 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
465 :class:`UnicodeEncodeError`.
466 *encoding* and *errors* must not be supplied if *string* is a
467 :class:`bytes`, or a :class:`TypeError` is raised.
468
469 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
470 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
471
472 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000473
474
Georg Brandl7f01a132009-09-16 15:58:14 +0000475.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000476
Georg Brandl0f7ede42008-06-23 11:23:31 +0000477 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000478 quoting HTML form values when building up a query string to go into a URL.
479 Plus signs in the original string are escaped unless they are included in
480 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000481
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000482 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000483
Georg Brandl7f01a132009-09-16 15:58:14 +0000484
485.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000486
487 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
488 :class:`str`, and does not perform string-to-bytes encoding.
489
490 Example: ``quote_from_bytes(b'a&\xef')`` yields
491 ``'a%26%EF'``.
492
Georg Brandl7f01a132009-09-16 15:58:14 +0000493
494.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000495
496 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000497 The optional *encoding* and *errors* parameters specify how to decode
498 percent-encoded sequences into Unicode characters, as accepted by the
499 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000500
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000501 *string* must be a :class:`str`.
502
503 *encoding* defaults to ``'utf-8'``.
504 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
505 by a placeholder character.
506
507 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000508
509
Georg Brandl7f01a132009-09-16 15:58:14 +0000510.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000511
Georg Brandl0f7ede42008-06-23 11:23:31 +0000512 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000513 unquoting HTML form values.
514
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000515 *string* must be a :class:`str`.
516
517 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
518
Georg Brandl7f01a132009-09-16 15:58:14 +0000519
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000520.. function:: unquote_to_bytes(string)
521
522 Replace ``%xx`` escapes by their single-octet equivalent, and return a
523 :class:`bytes` object.
524
525 *string* may be either a :class:`str` or a :class:`bytes`.
526
527 If it is a :class:`str`, unescaped non-ASCII characters in *string*
528 are encoded into UTF-8 bytes.
529
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000530 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000531
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000532
R David Murrayc17686f2015-05-17 20:44:50 -0400533.. function:: urlencode(query, doseq=False, safe='', encoding=None, \
534 errors=None, quote_via=quote_plus)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000535
Senthil Kumarandf022da2010-07-03 17:48:22 +0000536 Convert a mapping object or a sequence of two-element tuples, which may
Martin Pantercda85a02015-11-24 22:33:18 +0000537 contain :class:`str` or :class:`bytes` objects, to a percent-encoded ASCII
538 text string. If the resultant string is to be used as a *data* for POST
539 operation with the :func:`~urllib.request.urlopen` function, then
540 it should be encoded to bytes, otherwise it would result in a
541 :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700542
Senthil Kumarandf022da2010-07-03 17:48:22 +0000543 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
R David Murrayc17686f2015-05-17 20:44:50 -0400544 characters, where both *key* and *value* are quoted using the *quote_via*
545 function. By default, :func:`quote_plus` is used to quote the values, which
546 means spaces are quoted as a ``'+'`` character and '/' characters are
547 encoded as ``%2F``, which follows the standard for GET requests
548 (``application/x-www-form-urlencoded``). An alternate function that can be
549 passed as *quote_via* is :func:`quote`, which will encode spaces as ``%20``
550 and not encode '/' characters. For maximum control of what is quoted, use
551 ``quote`` and specify a value for *safe*.
552
553 When a sequence of two-element tuples is used as the *query*
Senthil Kumarandf022da2010-07-03 17:48:22 +0000554 argument, the first element of each tuple is a key and the second is a
555 value. The value element in itself can be a sequence and in that case, if
Serhiy Storchakaa97cd2e2016-10-19 16:43:42 +0300556 the optional parameter *doseq* is evaluates to ``True``, individual
Senthil Kumarandf022da2010-07-03 17:48:22 +0000557 ``key=value`` pairs separated by ``'&'`` are generated for each element of
558 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000559 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000560
R David Murray8c4e1122014-12-24 21:23:18 -0500561 The *safe*, *encoding*, and *errors* parameters are passed down to
R David Murrayc17686f2015-05-17 20:44:50 -0400562 *quote_via* (the *encoding* and *errors* parameters are only passed
R David Murray8c4e1122014-12-24 21:23:18 -0500563 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000564
565 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
566 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000567
Senthil Kumaran29333122011-02-11 11:25:47 +0000568 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
569 method can be used for generating query string for a URL or data for POST.
570
Senthil Kumarandf022da2010-07-03 17:48:22 +0000571 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000572 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000573
R David Murrayc17686f2015-05-17 20:44:50 -0400574 .. versionadded:: 3.5
575 *quote_via* parameter.
576
Georg Brandl116aa622007-08-15 14:28:22 +0000577
578.. seealso::
579
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000580 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700581 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000582 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000583 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000584 parsing requirements as commonly observed in major browsers.
585
586 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
587 This specifies the parsing requirements of IPv6 URLs.
588
589 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
590 Document describing the generic syntactic requirements for both Uniform Resource
591 Names (URNs) and Uniform Resource Locators (URLs).
592
593 :rfc:`2368` - The mailto URL scheme.
Martin Panterfe289c02016-05-28 02:20:39 +0000594 Parsing requirements for mailto URL schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000595
596 :rfc:`1808` - Relative Uniform Resource Locators
597 This Request For Comments includes the rules for joining an absolute and a
598 relative URL, including a fair number of "Abnormal Examples" which govern the
599 treatment of border cases.
600
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000601 :rfc:`1738` - Uniform Resource Locators (URL)
602 This specifies the formal syntax and semantics of absolute URLs.