blob: 01ac4444449ac18fc43a9990992d8d187afb04ef [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
15This module defines a standard interface to break Uniform Resource Locator (URL)
16strings up in components (addressing scheme, network location, path etc.), to
17combine the components back into a URL string, and to convert a "relative URL"
18to an absolute URL given a "base URL."
19
20The module has been designed to match the Internet RFC on Relative Uniform
21Resource Locators (and discovered a bug in an earlier draft!). It supports the
22following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
Georg Brandl0f7ede42008-06-23 11:23:31 +000023``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
24``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
25``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000026
Nick Coghlan9fc443c2010-11-30 15:48:08 +000027The :mod:`urllib.parse` module defines functions that fall into two broad
28categories: URL parsing and URL quoting. These are covered in detail in
29the following sections.
30
31URL Parsing
32-----------
33
34The URL parsing functions focus on splitting a URL string into its components,
35or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000036
R. David Murrayf5077aa2010-05-25 15:36:46 +000037.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000038
39 Parse a URL into six components, returning a 6-tuple. This corresponds to the
40 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
41 Each tuple item is a string, possibly empty. The components are not broken up in
42 smaller parts (for example, the network location is a single string), and %
43 escapes are not expanded. The delimiters as shown above are not part of the
44 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000045 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000046
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000047 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000048 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 >>> o # doctest: +NORMALIZE_WHITESPACE
50 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
51 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o.scheme
53 'http'
54 >>> o.port
55 80
56 >>> o.geturl()
57 'http://www.cwi.nl:80/%7Eguido/Python.html'
58
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000059 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
60 a netloc only if it is properly introduced by '//'. Otherwise the
61 input is presumed to be a relative URL and thus to start with
62 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000063
64 >>> from urlparse import urlparse
65 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
66 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
67 params='', query='', fragment='')
68 >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
69 ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html',
70 params='', query='', fragment='')
71 >>> urlparse('help/Python.html')
72 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
73 query='', fragment='')
74
R. David Murrayf5077aa2010-05-25 15:36:46 +000075 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000076 scheme, to be used only if the URL does not specify one. The default value for
77 this argument is the empty string.
78
79 If the *allow_fragments* argument is false, fragment identifiers are not
80 allowed, even if the URL's addressing scheme normally does support them. The
81 default value for this argument is :const:`True`.
82
83 The return value is actually an instance of a subclass of :class:`tuple`. This
84 class has the following additional read-only convenience attributes:
85
86 +------------------+-------+--------------------------+----------------------+
87 | Attribute | Index | Value | Value if not present |
88 +==================+=======+==========================+======================+
89 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
90 +------------------+-------+--------------------------+----------------------+
91 | :attr:`netloc` | 1 | Network location part | empty string |
92 +------------------+-------+--------------------------+----------------------+
93 | :attr:`path` | 2 | Hierarchical path | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`params` | 3 | Parameters for last path | empty string |
96 | | | element | |
97 +------------------+-------+--------------------------+----------------------+
98 | :attr:`query` | 4 | Query component | empty string |
99 +------------------+-------+--------------------------+----------------------+
100 | :attr:`fragment` | 5 | Fragment identifier | empty string |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`username` | | User name | :const:`None` |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`password` | | Password | :const:`None` |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`port` | | Port number as integer, | :const:`None` |
109 | | | if present | |
110 +------------------+-------+--------------------------+----------------------+
111
112 See section :ref:`urlparse-result-object` for more information on the result
113 object.
114
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000115 .. versionchanged:: 3.2
116 Added IPv6 URL parsing capabilities.
117
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Victor Stinnerac71c542011-01-14 12:52:12 +0000119.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000120
121 Parse a query string given as a string argument (data of type
122 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
123 dictionary. The dictionary keys are the unique query variable names and the
124 values are lists of values for each name.
125
126 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000127 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000128 indicates that blanks should be retained as blank strings. The default false
129 value indicates that blank values are to be ignored and treated as if they were
130 not included.
131
132 The optional argument *strict_parsing* is a flag indicating what to do with
133 parsing errors. If false (the default), errors are silently ignored. If true,
134 errors raise a :exc:`ValueError` exception.
135
Victor Stinnerac71c542011-01-14 12:52:12 +0000136 The optional *encoding* and *errors* parameters specify how to decode
137 percent-encoded sequences into Unicode characters, as accepted by the
138 :meth:`bytes.decode` method.
139
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000140 Use the :func:`urllib.parse.urlencode` function to convert such
141 dictionaries into query strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000142
143
Victor Stinnerac71c542011-01-14 12:52:12 +0000144.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000145
146 Parse a query string given as a string argument (data of type
147 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
148 name, value pairs.
149
150 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000151 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000152 indicates that blanks should be retained as blank strings. The default false
153 value indicates that blank values are to be ignored and treated as if they were
154 not included.
155
156 The optional argument *strict_parsing* is a flag indicating what to do with
157 parsing errors. If false (the default), errors are silently ignored. If true,
158 errors raise a :exc:`ValueError` exception.
159
Victor Stinnerac71c542011-01-14 12:52:12 +0000160 The optional *encoding* and *errors* parameters specify how to decode
161 percent-encoded sequences into Unicode characters, as accepted by the
162 :meth:`bytes.decode` method.
163
Facundo Batistac469d4c2008-09-03 22:49:01 +0000164 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
165 query strings.
166
167
Georg Brandl116aa622007-08-15 14:28:22 +0000168.. function:: urlunparse(parts)
169
Georg Brandl0f7ede42008-06-23 11:23:31 +0000170 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
171 argument can be any six-item iterable. This may result in a slightly
172 different, but equivalent URL, if the URL that was parsed originally had
173 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
174 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000175
176
R. David Murrayf5077aa2010-05-25 15:36:46 +0000177.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000178
179 This is similar to :func:`urlparse`, but does not split the params from the URL.
180 This should generally be used instead of :func:`urlparse` if the more recent URL
181 syntax allowing parameters to be applied to each segment of the *path* portion
182 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
183 separate the path segments and parameters. This function returns a 5-tuple:
184 (addressing scheme, network location, path, query, fragment identifier).
185
186 The return value is actually an instance of a subclass of :class:`tuple`. This
187 class has the following additional read-only convenience attributes:
188
189 +------------------+-------+-------------------------+----------------------+
190 | Attribute | Index | Value | Value if not present |
191 +==================+=======+=========================+======================+
192 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
193 +------------------+-------+-------------------------+----------------------+
194 | :attr:`netloc` | 1 | Network location part | empty string |
195 +------------------+-------+-------------------------+----------------------+
196 | :attr:`path` | 2 | Hierarchical path | empty string |
197 +------------------+-------+-------------------------+----------------------+
198 | :attr:`query` | 3 | Query component | empty string |
199 +------------------+-------+-------------------------+----------------------+
200 | :attr:`fragment` | 4 | Fragment identifier | empty string |
201 +------------------+-------+-------------------------+----------------------+
202 | :attr:`username` | | User name | :const:`None` |
203 +------------------+-------+-------------------------+----------------------+
204 | :attr:`password` | | Password | :const:`None` |
205 +------------------+-------+-------------------------+----------------------+
206 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
207 +------------------+-------+-------------------------+----------------------+
208 | :attr:`port` | | Port number as integer, | :const:`None` |
209 | | | if present | |
210 +------------------+-------+-------------------------+----------------------+
211
212 See section :ref:`urlparse-result-object` for more information on the result
213 object.
214
Georg Brandl116aa622007-08-15 14:28:22 +0000215
216.. function:: urlunsplit(parts)
217
Georg Brandl0f7ede42008-06-23 11:23:31 +0000218 Combine the elements of a tuple as returned by :func:`urlsplit` into a
219 complete URL as a string. The *parts* argument can be any five-item
220 iterable. This may result in a slightly different, but equivalent URL, if the
221 URL that was parsed originally had unnecessary delimiters (for example, a ?
222 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000223
Georg Brandl116aa622007-08-15 14:28:22 +0000224
Georg Brandl7f01a132009-09-16 15:58:14 +0000225.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000226
227 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
228 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000229 particular the addressing scheme, the network location and (part of) the
230 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000231
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000232 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000233 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
234 'http://www.cwi.nl/%7Eguido/FAQ.html'
235
236 The *allow_fragments* argument has the same meaning and default as for
237 :func:`urlparse`.
238
239 .. note::
240
241 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
242 the *url*'s host name and/or scheme will be present in the result. For example:
243
Christian Heimesfe337bf2008-03-23 21:54:12 +0000244 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000245
246 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
247 ... '//www.python.org/%7Eguido')
248 'http://www.python.org/%7Eguido'
249
250 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
251 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
252
253
254.. function:: urldefrag(url)
255
Georg Brandl0f7ede42008-06-23 11:23:31 +0000256 If *url* contains a fragment identifier, return a modified version of *url*
257 with no fragment identifier, and the fragment identifier as a separate
258 string. If there is no fragment identifier in *url*, return *url* unmodified
259 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000260
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000261 The return value is actually an instance of a subclass of :class:`tuple`. This
262 class has the following additional read-only convenience attributes:
263
264 +------------------+-------+-------------------------+----------------------+
265 | Attribute | Index | Value | Value if not present |
266 +==================+=======+=========================+======================+
267 | :attr:`url` | 0 | URL with no fragment | empty string |
268 +------------------+-------+-------------------------+----------------------+
269 | :attr:`fragment` | 1 | Fragment identifier | empty string |
270 +------------------+-------+-------------------------+----------------------+
271
272 See section :ref:`urlparse-result-object` for more information on the result
273 object.
274
275 .. versionchanged:: 3.2
276 Result is a structured object rather than a simple 2-tuple
277
278
279Parsing ASCII Encoded Bytes
280---------------------------
281
282The URL parsing functions were originally designed to operate on character
283strings only. In practice, it is useful to be able to manipulate properly
284quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
285URL parsing functions in this module all operate on :class:`bytes` and
286:class:`bytearray` objects in addition to :class:`str` objects.
287
288If :class:`str` data is passed in, the result will also contain only
289:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
290passed in, the result will contain only :class:`bytes` data.
291
292Attempting to mix :class:`str` data with :class:`bytes` or
293:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000294:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000295byte values will trigger :exc:`UnicodeDecodeError`.
296
297To support easier conversion of result objects between :class:`str` and
298:class:`bytes`, all return values from URL parsing functions provide
299either an :meth:`encode` method (when the result contains :class:`str`
300data) or a :meth:`decode` method (when the result contains :class:`bytes`
301data). The signatures of these methods match those of the corresponding
302:class:`str` and :class:`bytes` methods (except that the default encoding
303is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
304corresponding type that contains either :class:`bytes` data (for
305:meth:`encode` methods) or :class:`str` data (for
306:meth:`decode` methods).
307
308Applications that need to operate on potentially improperly quoted URLs
309that may contain non-ASCII data will need to do their own decoding from
310bytes to characters before invoking the URL parsing methods.
311
312The behaviour described in this section applies only to the URL parsing
313functions. The URL quoting functions use their own rules when producing
314or consuming byte sequences as detailed in the documentation of the
315individual URL quoting functions.
316
317.. versionchanged:: 3.2
318 URL parsing functions now accept ASCII encoded byte sequences
319
320
321.. _urlparse-result-object:
322
323Structured Parse Results
324------------------------
325
326The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000327:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000328These subclasses add the attributes listed in the documentation for
329those functions, the encoding and decoding support described in the
330previous section, as well as an additional method:
331
332.. method:: urllib.parse.SplitResult.geturl()
333
334 Return the re-combined version of the original URL as a string. This may
335 differ from the original URL in that the scheme may be normalized to lower
336 case and empty components may be dropped. Specifically, empty parameters,
337 queries, and fragment identifiers will be removed.
338
339 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
340 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
341 made to the URL returned by this method.
342
343 The result of this method remains unchanged if passed back through the original
344 parsing function:
345
346 >>> from urllib.parse import urlsplit
347 >>> url = 'HTTP://www.Python.org/doc/#'
348 >>> r1 = urlsplit(url)
349 >>> r1.geturl()
350 'http://www.Python.org/doc/'
351 >>> r2 = urlsplit(r1.geturl())
352 >>> r2.geturl()
353 'http://www.Python.org/doc/'
354
355
356The following classes provide the implementations of the structured parse
357results when operating on :class:`str` objects:
358
359.. class:: DefragResult(url, fragment)
360
361 Concrete class for :func:`urldefrag` results containing :class:`str`
362 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
363 instance.
364
365 .. versionadded:: 3.2
366
367.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
368
369 Concrete class for :func:`urlparse` results containing :class:`str`
370 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
371 instance.
372
373.. class:: SplitResult(scheme, netloc, path, query, fragment)
374
375 Concrete class for :func:`urlsplit` results containing :class:`str`
376 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
377 instance.
378
379
380The following classes provide the implementations of the parse results when
381operating on :class:`bytes` or :class:`bytearray` objects:
382
383.. class:: DefragResultBytes(url, fragment)
384
385 Concrete class for :func:`urldefrag` results containing :class:`bytes`
386 data. The :meth:`decode` method returns a :class:`DefragResult`
387 instance.
388
389 .. versionadded:: 3.2
390
391.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
392
393 Concrete class for :func:`urlparse` results containing :class:`bytes`
394 data. The :meth:`decode` method returns a :class:`ParseResult`
395 instance.
396
397 .. versionadded:: 3.2
398
399.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
400
401 Concrete class for :func:`urlsplit` results containing :class:`bytes`
402 data. The :meth:`decode` method returns a :class:`SplitResult`
403 instance.
404
405 .. versionadded:: 3.2
406
407
408URL Quoting
409-----------
410
411The URL quoting functions focus on taking program data and making it safe
412for use as URL components by quoting special characters and appropriately
413encoding non-ASCII text. They also support reversing these operations to
414recreate the original data from the contents of a URL component if that
415task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000416
417.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000418
419 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000420 digits, and the characters ``'_.-'`` are never quoted. By default, this
421 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000422 parameter specifies additional ASCII characters that should not be quoted
423 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000424
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000425 *string* may be either a :class:`str` or a :class:`bytes`.
426
427 The optional *encoding* and *errors* parameters specify how to deal with
428 non-ASCII characters, as accepted by the :meth:`str.encode` method.
429 *encoding* defaults to ``'utf-8'``.
430 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
431 :class:`UnicodeEncodeError`.
432 *encoding* and *errors* must not be supplied if *string* is a
433 :class:`bytes`, or a :class:`TypeError` is raised.
434
435 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
436 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
437
438 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000439
440
Georg Brandl7f01a132009-09-16 15:58:14 +0000441.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000442
Georg Brandl0f7ede42008-06-23 11:23:31 +0000443 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000444 quoting HTML form values when building up a query string to go into a URL.
445 Plus signs in the original string are escaped unless they are included in
446 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000447
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000448 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000449
Georg Brandl7f01a132009-09-16 15:58:14 +0000450
451.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000452
453 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
454 :class:`str`, and does not perform string-to-bytes encoding.
455
456 Example: ``quote_from_bytes(b'a&\xef')`` yields
457 ``'a%26%EF'``.
458
Georg Brandl7f01a132009-09-16 15:58:14 +0000459
460.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000461
462 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000463 The optional *encoding* and *errors* parameters specify how to decode
464 percent-encoded sequences into Unicode characters, as accepted by the
465 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000466
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000467 *string* must be a :class:`str`.
468
469 *encoding* defaults to ``'utf-8'``.
470 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
471 by a placeholder character.
472
473 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000474
475
Georg Brandl7f01a132009-09-16 15:58:14 +0000476.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000477
Georg Brandl0f7ede42008-06-23 11:23:31 +0000478 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000479 unquoting HTML form values.
480
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000481 *string* must be a :class:`str`.
482
483 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
484
Georg Brandl7f01a132009-09-16 15:58:14 +0000485
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000486.. function:: unquote_to_bytes(string)
487
488 Replace ``%xx`` escapes by their single-octet equivalent, and return a
489 :class:`bytes` object.
490
491 *string* may be either a :class:`str` or a :class:`bytes`.
492
493 If it is a :class:`str`, unescaped non-ASCII characters in *string*
494 are encoded into UTF-8 bytes.
495
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000496 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000497
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000498
Senthil Kumarandf022da2010-07-03 17:48:22 +0000499.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000500
Senthil Kumarandf022da2010-07-03 17:48:22 +0000501 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000502 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded" string,
Senthil Kumarandf022da2010-07-03 17:48:22 +0000503 suitable to pass to :func:`urlopen` above as the optional *data* argument.
504 This is useful to pass a dictionary of form fields to a ``POST`` request.
505 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
506 characters, where both *key* and *value* are quoted using :func:`quote_plus`
507 above. When a sequence of two-element tuples is used as the *query*
508 argument, the first element of each tuple is a key and the second is a
509 value. The value element in itself can be a sequence and in that case, if
510 the optional parameter *doseq* is evaluates to *True*, individual
511 ``key=value`` pairs separated by ``'&'`` are generated for each element of
512 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000513 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000514
515 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000516 parameters are passed down to :func:`quote_plus` for encoding.
517
518 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
519 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000520
521 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000522 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000523
Georg Brandl116aa622007-08-15 14:28:22 +0000524
525.. seealso::
526
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000527 :rfc:`3986` - Uniform Resource Identifiers
528 This is the current standard (STD66). Any changes to urlparse module
529 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000530 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000531 parsing requirements as commonly observed in major browsers.
532
533 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
534 This specifies the parsing requirements of IPv6 URLs.
535
536 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
537 Document describing the generic syntactic requirements for both Uniform Resource
538 Names (URNs) and Uniform Resource Locators (URLs).
539
540 :rfc:`2368` - The mailto URL scheme.
541 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000542
543 :rfc:`1808` - Relative Uniform Resource Locators
544 This Request For Comments includes the rules for joining an absolute and a
545 relative URL, including a fair number of "Abnormal Examples" which govern the
546 treatment of border cases.
547
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000548 :rfc:`1738` - Uniform Resource Locators (URL)
549 This specifies the formal syntax and semantics of absolute URLs.