blob: 154a521fa1e42ef0fc31a2a31dd103a84b95595f [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
Senthil Kumaran4a27d9f2012-06-28 21:07:58 -070025Resource Locators. It supports the following URL schemes: ``file``, ``ftp``,
26``gopher``, ``hdl``, ``http``, ``https``, ``imap``, ``mailto``, ``mms``,
27``news``, ``nntp``, ``prospero``, ``rsync``, ``rtsp``, ``rtspu``, ``sftp``,
28``shttp``, ``sip``, ``sips``, ``snews``, ``svn``, ``svn+ssh``, ``telnet``,
29``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
Senthil Kumaran8fd36692013-02-26 01:02:58 -080072 >>> urlparse('www.cwi.nl/%7Eguido/Python.html')
Senthil Kumaran21b29332013-09-30 22:12:16 -070073 ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html',
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000074 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
R. David Murrayf5077aa2010-05-25 15:36:46 +000079 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000080 scheme, to be used only if the URL does not specify one. The default value for
81 this argument is the empty string.
82
83 If the *allow_fragments* argument is false, fragment identifiers are not
Georg Brandl62b08132014-10-12 16:13:32 +020084 recognized and parsed as part of the preceding component. The default value
85 for this argument is :const:`True`.
Georg Brandl116aa622007-08-15 14:28:22 +000086
87 The return value is actually an instance of a subclass of :class:`tuple`. This
88 class has the following additional read-only convenience attributes:
89
90 +------------------+-------+--------------------------+----------------------+
91 | Attribute | Index | Value | Value if not present |
92 +==================+=======+==========================+======================+
93 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`netloc` | 1 | Network location part | empty string |
96 +------------------+-------+--------------------------+----------------------+
97 | :attr:`path` | 2 | Hierarchical path | empty string |
98 +------------------+-------+--------------------------+----------------------+
99 | :attr:`params` | 3 | Parameters for last path | empty string |
100 | | | element | |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`query` | 4 | Query component | empty string |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`fragment` | 5 | Fragment identifier | empty string |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`username` | | User name | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`password` | | Password | :const:`None` |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`port` | | Port number as integer, | :const:`None` |
113 | | | if present | |
114 +------------------+-------+--------------------------+----------------------+
115
116 See section :ref:`urlparse-result-object` for more information on the result
117 object.
118
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000119 .. versionchanged:: 3.2
120 Added IPv6 URL parsing capabilities.
121
Georg Brandla79b8dc2012-09-29 08:59:23 +0200122 .. versionchanged:: 3.3
123 The fragment is now parsed for all URL schemes (unless *allow_fragment* is
124 false), in accordance with :rfc:`3986`. Previously, a whitelist of
125 schemes that support fragments existed.
126
Georg Brandl116aa622007-08-15 14:28:22 +0000127
Victor Stinnerac71c542011-01-14 12:52:12 +0000128.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000129
130 Parse a query string given as a string argument (data of type
131 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
132 dictionary. The dictionary keys are the unique query variable names and the
133 values are lists of values for each name.
134
135 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000136 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000137 indicates that blanks should be retained as blank strings. The default false
138 value indicates that blank values are to be ignored and treated as if they were
139 not included.
140
141 The optional argument *strict_parsing* is a flag indicating what to do with
142 parsing errors. If false (the default), errors are silently ignored. If true,
143 errors raise a :exc:`ValueError` exception.
144
Victor Stinnerac71c542011-01-14 12:52:12 +0000145 The optional *encoding* and *errors* parameters specify how to decode
146 percent-encoded sequences into Unicode characters, as accepted by the
147 :meth:`bytes.decode` method.
148
Michael Foord207d2292012-09-28 14:40:44 +0100149 Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
150 parameter set to ``True``) to convert such dictionaries into query
151 strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000152
Senthil Kumaran29333122011-02-11 11:25:47 +0000153
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000154 .. versionchanged:: 3.2
155 Add *encoding* and *errors* parameters.
156
Facundo Batistac469d4c2008-09-03 22:49:01 +0000157
Victor Stinnerac71c542011-01-14 12:52:12 +0000158.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000159
160 Parse a query string given as a string argument (data of type
161 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
162 name, value pairs.
163
164 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000165 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000166 indicates that blanks should be retained as blank strings. The default false
167 value indicates that blank values are to be ignored and treated as if they were
168 not included.
169
170 The optional argument *strict_parsing* is a flag indicating what to do with
171 parsing errors. If false (the default), errors are silently ignored. If true,
172 errors raise a :exc:`ValueError` exception.
173
Victor Stinnerac71c542011-01-14 12:52:12 +0000174 The optional *encoding* and *errors* parameters specify how to decode
175 percent-encoded sequences into Unicode characters, as accepted by the
176 :meth:`bytes.decode` method.
177
Facundo Batistac469d4c2008-09-03 22:49:01 +0000178 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
179 query strings.
180
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000181 .. versionchanged:: 3.2
182 Add *encoding* and *errors* parameters.
183
Facundo Batistac469d4c2008-09-03 22:49:01 +0000184
Georg Brandl116aa622007-08-15 14:28:22 +0000185.. function:: urlunparse(parts)
186
Georg Brandl0f7ede42008-06-23 11:23:31 +0000187 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
188 argument can be any six-item iterable. This may result in a slightly
189 different, but equivalent URL, if the URL that was parsed originally had
190 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
191 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000192
193
R. David Murrayf5077aa2010-05-25 15:36:46 +0000194.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000195
196 This is similar to :func:`urlparse`, but does not split the params from the URL.
197 This should generally be used instead of :func:`urlparse` if the more recent URL
198 syntax allowing parameters to be applied to each segment of the *path* portion
199 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
200 separate the path segments and parameters. This function returns a 5-tuple:
201 (addressing scheme, network location, path, query, fragment identifier).
202
203 The return value is actually an instance of a subclass of :class:`tuple`. This
204 class has the following additional read-only convenience attributes:
205
206 +------------------+-------+-------------------------+----------------------+
207 | Attribute | Index | Value | Value if not present |
208 +==================+=======+=========================+======================+
209 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
210 +------------------+-------+-------------------------+----------------------+
211 | :attr:`netloc` | 1 | Network location part | empty string |
212 +------------------+-------+-------------------------+----------------------+
213 | :attr:`path` | 2 | Hierarchical path | empty string |
214 +------------------+-------+-------------------------+----------------------+
215 | :attr:`query` | 3 | Query component | empty string |
216 +------------------+-------+-------------------------+----------------------+
217 | :attr:`fragment` | 4 | Fragment identifier | empty string |
218 +------------------+-------+-------------------------+----------------------+
219 | :attr:`username` | | User name | :const:`None` |
220 +------------------+-------+-------------------------+----------------------+
221 | :attr:`password` | | Password | :const:`None` |
222 +------------------+-------+-------------------------+----------------------+
223 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
224 +------------------+-------+-------------------------+----------------------+
225 | :attr:`port` | | Port number as integer, | :const:`None` |
226 | | | if present | |
227 +------------------+-------+-------------------------+----------------------+
228
229 See section :ref:`urlparse-result-object` for more information on the result
230 object.
231
Georg Brandl116aa622007-08-15 14:28:22 +0000232
233.. function:: urlunsplit(parts)
234
Georg Brandl0f7ede42008-06-23 11:23:31 +0000235 Combine the elements of a tuple as returned by :func:`urlsplit` into a
236 complete URL as a string. The *parts* argument can be any five-item
237 iterable. This may result in a slightly different, but equivalent URL, if the
238 URL that was parsed originally had unnecessary delimiters (for example, a ?
239 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000240
Georg Brandl116aa622007-08-15 14:28:22 +0000241
Georg Brandl7f01a132009-09-16 15:58:14 +0000242.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000243
244 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
245 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000246 particular the addressing scheme, the network location and (part of) the
247 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000248
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000249 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000250 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
251 'http://www.cwi.nl/%7Eguido/FAQ.html'
252
253 The *allow_fragments* argument has the same meaning and default as for
254 :func:`urlparse`.
255
256 .. note::
257
258 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
259 the *url*'s host name and/or scheme will be present in the result. For example:
260
Christian Heimesfe337bf2008-03-23 21:54:12 +0000261 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000262
263 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
264 ... '//www.python.org/%7Eguido')
265 'http://www.python.org/%7Eguido'
266
267 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
268 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
269
270
271.. function:: urldefrag(url)
272
Georg Brandl0f7ede42008-06-23 11:23:31 +0000273 If *url* contains a fragment identifier, return a modified version of *url*
274 with no fragment identifier, and the fragment identifier as a separate
275 string. If there is no fragment identifier in *url*, return *url* unmodified
276 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000277
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000278 The return value is actually an instance of a subclass of :class:`tuple`. This
279 class has the following additional read-only convenience attributes:
280
281 +------------------+-------+-------------------------+----------------------+
282 | Attribute | Index | Value | Value if not present |
283 +==================+=======+=========================+======================+
284 | :attr:`url` | 0 | URL with no fragment | empty string |
285 +------------------+-------+-------------------------+----------------------+
286 | :attr:`fragment` | 1 | Fragment identifier | empty string |
287 +------------------+-------+-------------------------+----------------------+
288
289 See section :ref:`urlparse-result-object` for more information on the result
290 object.
291
292 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000293 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000294
Georg Brandl009a6bd2011-01-24 19:59:08 +0000295.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000296
297Parsing ASCII Encoded Bytes
298---------------------------
299
300The URL parsing functions were originally designed to operate on character
301strings only. In practice, it is useful to be able to manipulate properly
302quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
303URL parsing functions in this module all operate on :class:`bytes` and
304:class:`bytearray` objects in addition to :class:`str` objects.
305
306If :class:`str` data is passed in, the result will also contain only
307:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
308passed in, the result will contain only :class:`bytes` data.
309
310Attempting to mix :class:`str` data with :class:`bytes` or
311:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000312:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000313byte values will trigger :exc:`UnicodeDecodeError`.
314
315To support easier conversion of result objects between :class:`str` and
316:class:`bytes`, all return values from URL parsing functions provide
317either an :meth:`encode` method (when the result contains :class:`str`
318data) or a :meth:`decode` method (when the result contains :class:`bytes`
319data). The signatures of these methods match those of the corresponding
320:class:`str` and :class:`bytes` methods (except that the default encoding
321is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
322corresponding type that contains either :class:`bytes` data (for
323:meth:`encode` methods) or :class:`str` data (for
324:meth:`decode` methods).
325
326Applications that need to operate on potentially improperly quoted URLs
327that may contain non-ASCII data will need to do their own decoding from
328bytes to characters before invoking the URL parsing methods.
329
330The behaviour described in this section applies only to the URL parsing
331functions. The URL quoting functions use their own rules when producing
332or consuming byte sequences as detailed in the documentation of the
333individual URL quoting functions.
334
335.. versionchanged:: 3.2
336 URL parsing functions now accept ASCII encoded byte sequences
337
338
339.. _urlparse-result-object:
340
341Structured Parse Results
342------------------------
343
344The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000345:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000346These subclasses add the attributes listed in the documentation for
347those functions, the encoding and decoding support described in the
348previous section, as well as an additional method:
349
350.. method:: urllib.parse.SplitResult.geturl()
351
352 Return the re-combined version of the original URL as a string. This may
353 differ from the original URL in that the scheme may be normalized to lower
354 case and empty components may be dropped. Specifically, empty parameters,
355 queries, and fragment identifiers will be removed.
356
357 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
358 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
359 made to the URL returned by this method.
360
361 The result of this method remains unchanged if passed back through the original
362 parsing function:
363
364 >>> from urllib.parse import urlsplit
365 >>> url = 'HTTP://www.Python.org/doc/#'
366 >>> r1 = urlsplit(url)
367 >>> r1.geturl()
368 'http://www.Python.org/doc/'
369 >>> r2 = urlsplit(r1.geturl())
370 >>> r2.geturl()
371 'http://www.Python.org/doc/'
372
373
374The following classes provide the implementations of the structured parse
375results when operating on :class:`str` objects:
376
377.. class:: DefragResult(url, fragment)
378
379 Concrete class for :func:`urldefrag` results containing :class:`str`
380 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
381 instance.
382
383 .. versionadded:: 3.2
384
385.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
386
387 Concrete class for :func:`urlparse` results containing :class:`str`
388 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
389 instance.
390
391.. class:: SplitResult(scheme, netloc, path, query, fragment)
392
393 Concrete class for :func:`urlsplit` results containing :class:`str`
394 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
395 instance.
396
397
398The following classes provide the implementations of the parse results when
399operating on :class:`bytes` or :class:`bytearray` objects:
400
401.. class:: DefragResultBytes(url, fragment)
402
403 Concrete class for :func:`urldefrag` results containing :class:`bytes`
404 data. The :meth:`decode` method returns a :class:`DefragResult`
405 instance.
406
407 .. versionadded:: 3.2
408
409.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
410
411 Concrete class for :func:`urlparse` results containing :class:`bytes`
412 data. The :meth:`decode` method returns a :class:`ParseResult`
413 instance.
414
415 .. versionadded:: 3.2
416
417.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
418
419 Concrete class for :func:`urlsplit` results containing :class:`bytes`
420 data. The :meth:`decode` method returns a :class:`SplitResult`
421 instance.
422
423 .. versionadded:: 3.2
424
425
426URL Quoting
427-----------
428
429The URL quoting functions focus on taking program data and making it safe
430for use as URL components by quoting special characters and appropriately
431encoding non-ASCII text. They also support reversing these operations to
432recreate the original data from the contents of a URL component if that
433task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000434
435.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000436
437 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000438 digits, and the characters ``'_.-'`` are never quoted. By default, this
439 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000440 parameter specifies additional ASCII characters that should not be quoted
441 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000442
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000443 *string* may be either a :class:`str` or a :class:`bytes`.
444
445 The optional *encoding* and *errors* parameters specify how to deal with
446 non-ASCII characters, as accepted by the :meth:`str.encode` method.
447 *encoding* defaults to ``'utf-8'``.
448 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
449 :class:`UnicodeEncodeError`.
450 *encoding* and *errors* must not be supplied if *string* is a
451 :class:`bytes`, or a :class:`TypeError` is raised.
452
453 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
454 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
455
456 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000457
458
Georg Brandl7f01a132009-09-16 15:58:14 +0000459.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000460
Georg Brandl0f7ede42008-06-23 11:23:31 +0000461 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000462 quoting HTML form values when building up a query string to go into a URL.
463 Plus signs in the original string are escaped unless they are included in
464 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000465
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000466 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000467
Georg Brandl7f01a132009-09-16 15:58:14 +0000468
469.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000470
471 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
472 :class:`str`, and does not perform string-to-bytes encoding.
473
474 Example: ``quote_from_bytes(b'a&\xef')`` yields
475 ``'a%26%EF'``.
476
Georg Brandl7f01a132009-09-16 15:58:14 +0000477
478.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000479
480 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000481 The optional *encoding* and *errors* parameters specify how to decode
482 percent-encoded sequences into Unicode characters, as accepted by the
483 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000484
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000485 *string* must be a :class:`str`.
486
487 *encoding* defaults to ``'utf-8'``.
488 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
489 by a placeholder character.
490
491 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000492
493
Georg Brandl7f01a132009-09-16 15:58:14 +0000494.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000495
Georg Brandl0f7ede42008-06-23 11:23:31 +0000496 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000497 unquoting HTML form values.
498
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000499 *string* must be a :class:`str`.
500
501 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
502
Georg Brandl7f01a132009-09-16 15:58:14 +0000503
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000504.. function:: unquote_to_bytes(string)
505
506 Replace ``%xx`` escapes by their single-octet equivalent, and return a
507 :class:`bytes` object.
508
509 *string* may be either a :class:`str` or a :class:`bytes`.
510
511 If it is a :class:`str`, unescaped non-ASCII characters in *string*
512 are encoded into UTF-8 bytes.
513
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000514 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000515
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000516
Senthil Kumarandf022da2010-07-03 17:48:22 +0000517.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000518
Senthil Kumarandf022da2010-07-03 17:48:22 +0000519 Convert a mapping object or a sequence of two-element tuples, which may
R David Murray8c4e1122014-12-24 21:23:18 -0500520 contain :class:`str` or :class:`bytes` objects, to a "percent-encoded"
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700521 string. If the resultant string is to be used as a *data* for POST
Serhiy Storchaka5e1c0532013-10-13 20:06:50 +0300522 operation with :func:`~urllib.request.urlopen` function, then it should be
523 properly encoded to bytes, otherwise it would result in a :exc:`TypeError`.
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700524
Senthil Kumarandf022da2010-07-03 17:48:22 +0000525 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
526 characters, where both *key* and *value* are quoted using :func:`quote_plus`
527 above. When a sequence of two-element tuples is used as the *query*
528 argument, the first element of each tuple is a key and the second is a
529 value. The value element in itself can be a sequence and in that case, if
530 the optional parameter *doseq* is evaluates to *True*, individual
531 ``key=value`` pairs separated by ``'&'`` are generated for each element of
532 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000533 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000534
R David Murray8c4e1122014-12-24 21:23:18 -0500535 The *safe*, *encoding*, and *errors* parameters are passed down to
536 :func:`quote_plus` (the *encoding* and *errors* parameters are only passed
537 when a query element is a :class:`str`).
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000538
539 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
540 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000541
Senthil Kumaran29333122011-02-11 11:25:47 +0000542 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
543 method can be used for generating query string for a URL or data for POST.
544
Senthil Kumarandf022da2010-07-03 17:48:22 +0000545 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000546 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000547
Georg Brandl116aa622007-08-15 14:28:22 +0000548
549.. seealso::
550
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000551 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700552 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000553 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000554 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000555 parsing requirements as commonly observed in major browsers.
556
557 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
558 This specifies the parsing requirements of IPv6 URLs.
559
560 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
561 Document describing the generic syntactic requirements for both Uniform Resource
562 Names (URNs) and Uniform Resource Locators (URLs).
563
564 :rfc:`2368` - The mailto URL scheme.
565 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000566
567 :rfc:`1808` - Relative Uniform Resource Locators
568 This Request For Comments includes the rules for joining an absolute and a
569 relative URL, including a fair number of "Abnormal Examples" which govern the
570 treatment of border cases.
571
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000572 :rfc:`1738` - Uniform Resource Locators (URL)
573 This specifies the formal syntax and semantics of absolute URLs.