blob: 22680d03fb7eb78ae60258a62255205df6c59441 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
15This module defines a standard interface to break Uniform Resource Locator (URL)
16strings up in components (addressing scheme, network location, path etc.), to
17combine the components back into a URL string, and to convert a "relative URL"
18to an absolute URL given a "base URL."
19
20The module has been designed to match the Internet RFC on Relative Uniform
21Resource Locators (and discovered a bug in an earlier draft!). It supports the
22following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
Georg Brandl0f7ede42008-06-23 11:23:31 +000023``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
24``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
25``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000026
Nick Coghlan9fc443c2010-11-30 15:48:08 +000027The :mod:`urllib.parse` module defines functions that fall into two broad
28categories: URL parsing and URL quoting. These are covered in detail in
29the following sections.
30
31URL Parsing
32-----------
33
34The URL parsing functions focus on splitting a URL string into its components,
35or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000036
R. David Murrayf5077aa2010-05-25 15:36:46 +000037.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000038
39 Parse a URL into six components, returning a 6-tuple. This corresponds to the
40 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
41 Each tuple item is a string, possibly empty. The components are not broken up in
42 smaller parts (for example, the network location is a single string), and %
43 escapes are not expanded. The delimiters as shown above are not part of the
44 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000045 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000046
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000047 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000048 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 >>> o # doctest: +NORMALIZE_WHITESPACE
50 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
51 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o.scheme
53 'http'
54 >>> o.port
55 80
56 >>> o.geturl()
57 'http://www.cwi.nl:80/%7Eguido/Python.html'
58
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000059 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
60 a netloc only if it is properly introduced by '//'. Otherwise the
61 input is presumed to be a relative URL and thus to start with
62 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000063
64 >>> from urlparse import urlparse
65 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
66 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
67 params='', query='', fragment='')
68 >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
69 ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html',
70 params='', query='', fragment='')
71 >>> urlparse('help/Python.html')
72 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
73 query='', fragment='')
74
R. David Murrayf5077aa2010-05-25 15:36:46 +000075 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000076 scheme, to be used only if the URL does not specify one. The default value for
77 this argument is the empty string.
78
79 If the *allow_fragments* argument is false, fragment identifiers are not
80 allowed, even if the URL's addressing scheme normally does support them. The
81 default value for this argument is :const:`True`.
82
83 The return value is actually an instance of a subclass of :class:`tuple`. This
84 class has the following additional read-only convenience attributes:
85
86 +------------------+-------+--------------------------+----------------------+
87 | Attribute | Index | Value | Value if not present |
88 +==================+=======+==========================+======================+
89 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
90 +------------------+-------+--------------------------+----------------------+
91 | :attr:`netloc` | 1 | Network location part | empty string |
92 +------------------+-------+--------------------------+----------------------+
93 | :attr:`path` | 2 | Hierarchical path | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`params` | 3 | Parameters for last path | empty string |
96 | | | element | |
97 +------------------+-------+--------------------------+----------------------+
98 | :attr:`query` | 4 | Query component | empty string |
99 +------------------+-------+--------------------------+----------------------+
100 | :attr:`fragment` | 5 | Fragment identifier | empty string |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`username` | | User name | :const:`None` |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`password` | | Password | :const:`None` |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`port` | | Port number as integer, | :const:`None` |
109 | | | if present | |
110 +------------------+-------+--------------------------+----------------------+
111
112 See section :ref:`urlparse-result-object` for more information on the result
113 object.
114
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000115 .. versionchanged:: 3.2
116 Added IPv6 URL parsing capabilities.
117
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Victor Stinnerac71c542011-01-14 12:52:12 +0000119.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000120
121 Parse a query string given as a string argument (data of type
122 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
123 dictionary. The dictionary keys are the unique query variable names and the
124 values are lists of values for each name.
125
126 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000127 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000128 indicates that blanks should be retained as blank strings. The default false
129 value indicates that blank values are to be ignored and treated as if they were
130 not included.
131
132 The optional argument *strict_parsing* is a flag indicating what to do with
133 parsing errors. If false (the default), errors are silently ignored. If true,
134 errors raise a :exc:`ValueError` exception.
135
Victor Stinnerac71c542011-01-14 12:52:12 +0000136 The optional *encoding* and *errors* parameters specify how to decode
137 percent-encoded sequences into Unicode characters, as accepted by the
138 :meth:`bytes.decode` method.
139
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000140 Use the :func:`urllib.parse.urlencode` function to convert such
141 dictionaries into query strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000142
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000143 .. versionchanged:: 3.2
144 Add *encoding* and *errors* parameters.
145
Facundo Batistac469d4c2008-09-03 22:49:01 +0000146
Victor Stinnerac71c542011-01-14 12:52:12 +0000147.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000148
149 Parse a query string given as a string argument (data of type
150 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
151 name, value pairs.
152
153 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000154 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000155 indicates that blanks should be retained as blank strings. The default false
156 value indicates that blank values are to be ignored and treated as if they were
157 not included.
158
159 The optional argument *strict_parsing* is a flag indicating what to do with
160 parsing errors. If false (the default), errors are silently ignored. If true,
161 errors raise a :exc:`ValueError` exception.
162
Victor Stinnerac71c542011-01-14 12:52:12 +0000163 The optional *encoding* and *errors* parameters specify how to decode
164 percent-encoded sequences into Unicode characters, as accepted by the
165 :meth:`bytes.decode` method.
166
Facundo Batistac469d4c2008-09-03 22:49:01 +0000167 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
168 query strings.
169
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000170 .. versionchanged:: 3.2
171 Add *encoding* and *errors* parameters.
172
Facundo Batistac469d4c2008-09-03 22:49:01 +0000173
Georg Brandl116aa622007-08-15 14:28:22 +0000174.. function:: urlunparse(parts)
175
Georg Brandl0f7ede42008-06-23 11:23:31 +0000176 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
177 argument can be any six-item iterable. This may result in a slightly
178 different, but equivalent URL, if the URL that was parsed originally had
179 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
180 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000181
182
R. David Murrayf5077aa2010-05-25 15:36:46 +0000183.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000184
185 This is similar to :func:`urlparse`, but does not split the params from the URL.
186 This should generally be used instead of :func:`urlparse` if the more recent URL
187 syntax allowing parameters to be applied to each segment of the *path* portion
188 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
189 separate the path segments and parameters. This function returns a 5-tuple:
190 (addressing scheme, network location, path, query, fragment identifier).
191
192 The return value is actually an instance of a subclass of :class:`tuple`. This
193 class has the following additional read-only convenience attributes:
194
195 +------------------+-------+-------------------------+----------------------+
196 | Attribute | Index | Value | Value if not present |
197 +==================+=======+=========================+======================+
198 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
199 +------------------+-------+-------------------------+----------------------+
200 | :attr:`netloc` | 1 | Network location part | empty string |
201 +------------------+-------+-------------------------+----------------------+
202 | :attr:`path` | 2 | Hierarchical path | empty string |
203 +------------------+-------+-------------------------+----------------------+
204 | :attr:`query` | 3 | Query component | empty string |
205 +------------------+-------+-------------------------+----------------------+
206 | :attr:`fragment` | 4 | Fragment identifier | empty string |
207 +------------------+-------+-------------------------+----------------------+
208 | :attr:`username` | | User name | :const:`None` |
209 +------------------+-------+-------------------------+----------------------+
210 | :attr:`password` | | Password | :const:`None` |
211 +------------------+-------+-------------------------+----------------------+
212 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
213 +------------------+-------+-------------------------+----------------------+
214 | :attr:`port` | | Port number as integer, | :const:`None` |
215 | | | if present | |
216 +------------------+-------+-------------------------+----------------------+
217
218 See section :ref:`urlparse-result-object` for more information on the result
219 object.
220
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222.. function:: urlunsplit(parts)
223
Georg Brandl0f7ede42008-06-23 11:23:31 +0000224 Combine the elements of a tuple as returned by :func:`urlsplit` into a
225 complete URL as a string. The *parts* argument can be any five-item
226 iterable. This may result in a slightly different, but equivalent URL, if the
227 URL that was parsed originally had unnecessary delimiters (for example, a ?
228 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000229
Georg Brandl116aa622007-08-15 14:28:22 +0000230
Georg Brandl7f01a132009-09-16 15:58:14 +0000231.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000232
233 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
234 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000235 particular the addressing scheme, the network location and (part of) the
236 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000237
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000238 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000239 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
240 'http://www.cwi.nl/%7Eguido/FAQ.html'
241
242 The *allow_fragments* argument has the same meaning and default as for
243 :func:`urlparse`.
244
245 .. note::
246
247 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
248 the *url*'s host name and/or scheme will be present in the result. For example:
249
Christian Heimesfe337bf2008-03-23 21:54:12 +0000250 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000251
252 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
253 ... '//www.python.org/%7Eguido')
254 'http://www.python.org/%7Eguido'
255
256 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
257 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
258
259
260.. function:: urldefrag(url)
261
Georg Brandl0f7ede42008-06-23 11:23:31 +0000262 If *url* contains a fragment identifier, return a modified version of *url*
263 with no fragment identifier, and the fragment identifier as a separate
264 string. If there is no fragment identifier in *url*, return *url* unmodified
265 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000266
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000267 The return value is actually an instance of a subclass of :class:`tuple`. This
268 class has the following additional read-only convenience attributes:
269
270 +------------------+-------+-------------------------+----------------------+
271 | Attribute | Index | Value | Value if not present |
272 +==================+=======+=========================+======================+
273 | :attr:`url` | 0 | URL with no fragment | empty string |
274 +------------------+-------+-------------------------+----------------------+
275 | :attr:`fragment` | 1 | Fragment identifier | empty string |
276 +------------------+-------+-------------------------+----------------------+
277
278 See section :ref:`urlparse-result-object` for more information on the result
279 object.
280
281 .. versionchanged:: 3.2
282 Result is a structured object rather than a simple 2-tuple
283
284
285Parsing ASCII Encoded Bytes
286---------------------------
287
288The URL parsing functions were originally designed to operate on character
289strings only. In practice, it is useful to be able to manipulate properly
290quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
291URL parsing functions in this module all operate on :class:`bytes` and
292:class:`bytearray` objects in addition to :class:`str` objects.
293
294If :class:`str` data is passed in, the result will also contain only
295:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
296passed in, the result will contain only :class:`bytes` data.
297
298Attempting to mix :class:`str` data with :class:`bytes` or
299:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000300:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000301byte values will trigger :exc:`UnicodeDecodeError`.
302
303To support easier conversion of result objects between :class:`str` and
304:class:`bytes`, all return values from URL parsing functions provide
305either an :meth:`encode` method (when the result contains :class:`str`
306data) or a :meth:`decode` method (when the result contains :class:`bytes`
307data). The signatures of these methods match those of the corresponding
308:class:`str` and :class:`bytes` methods (except that the default encoding
309is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
310corresponding type that contains either :class:`bytes` data (for
311:meth:`encode` methods) or :class:`str` data (for
312:meth:`decode` methods).
313
314Applications that need to operate on potentially improperly quoted URLs
315that may contain non-ASCII data will need to do their own decoding from
316bytes to characters before invoking the URL parsing methods.
317
318The behaviour described in this section applies only to the URL parsing
319functions. The URL quoting functions use their own rules when producing
320or consuming byte sequences as detailed in the documentation of the
321individual URL quoting functions.
322
323.. versionchanged:: 3.2
324 URL parsing functions now accept ASCII encoded byte sequences
325
326
327.. _urlparse-result-object:
328
329Structured Parse Results
330------------------------
331
332The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000333:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000334These subclasses add the attributes listed in the documentation for
335those functions, the encoding and decoding support described in the
336previous section, as well as an additional method:
337
338.. method:: urllib.parse.SplitResult.geturl()
339
340 Return the re-combined version of the original URL as a string. This may
341 differ from the original URL in that the scheme may be normalized to lower
342 case and empty components may be dropped. Specifically, empty parameters,
343 queries, and fragment identifiers will be removed.
344
345 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
346 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
347 made to the URL returned by this method.
348
349 The result of this method remains unchanged if passed back through the original
350 parsing function:
351
352 >>> from urllib.parse import urlsplit
353 >>> url = 'HTTP://www.Python.org/doc/#'
354 >>> r1 = urlsplit(url)
355 >>> r1.geturl()
356 'http://www.Python.org/doc/'
357 >>> r2 = urlsplit(r1.geturl())
358 >>> r2.geturl()
359 'http://www.Python.org/doc/'
360
361
362The following classes provide the implementations of the structured parse
363results when operating on :class:`str` objects:
364
365.. class:: DefragResult(url, fragment)
366
367 Concrete class for :func:`urldefrag` results containing :class:`str`
368 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
369 instance.
370
371 .. versionadded:: 3.2
372
373.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
374
375 Concrete class for :func:`urlparse` results containing :class:`str`
376 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
377 instance.
378
379.. class:: SplitResult(scheme, netloc, path, query, fragment)
380
381 Concrete class for :func:`urlsplit` results containing :class:`str`
382 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
383 instance.
384
385
386The following classes provide the implementations of the parse results when
387operating on :class:`bytes` or :class:`bytearray` objects:
388
389.. class:: DefragResultBytes(url, fragment)
390
391 Concrete class for :func:`urldefrag` results containing :class:`bytes`
392 data. The :meth:`decode` method returns a :class:`DefragResult`
393 instance.
394
395 .. versionadded:: 3.2
396
397.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
398
399 Concrete class for :func:`urlparse` results containing :class:`bytes`
400 data. The :meth:`decode` method returns a :class:`ParseResult`
401 instance.
402
403 .. versionadded:: 3.2
404
405.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
406
407 Concrete class for :func:`urlsplit` results containing :class:`bytes`
408 data. The :meth:`decode` method returns a :class:`SplitResult`
409 instance.
410
411 .. versionadded:: 3.2
412
413
414URL Quoting
415-----------
416
417The URL quoting functions focus on taking program data and making it safe
418for use as URL components by quoting special characters and appropriately
419encoding non-ASCII text. They also support reversing these operations to
420recreate the original data from the contents of a URL component if that
421task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000422
423.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000424
425 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000426 digits, and the characters ``'_.-'`` are never quoted. By default, this
427 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000428 parameter specifies additional ASCII characters that should not be quoted
429 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000430
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000431 *string* may be either a :class:`str` or a :class:`bytes`.
432
433 The optional *encoding* and *errors* parameters specify how to deal with
434 non-ASCII characters, as accepted by the :meth:`str.encode` method.
435 *encoding* defaults to ``'utf-8'``.
436 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
437 :class:`UnicodeEncodeError`.
438 *encoding* and *errors* must not be supplied if *string* is a
439 :class:`bytes`, or a :class:`TypeError` is raised.
440
441 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
442 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
443
444 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000445
446
Georg Brandl7f01a132009-09-16 15:58:14 +0000447.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000448
Georg Brandl0f7ede42008-06-23 11:23:31 +0000449 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000450 quoting HTML form values when building up a query string to go into a URL.
451 Plus signs in the original string are escaped unless they are included in
452 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000453
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000454 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000455
Georg Brandl7f01a132009-09-16 15:58:14 +0000456
457.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000458
459 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
460 :class:`str`, and does not perform string-to-bytes encoding.
461
462 Example: ``quote_from_bytes(b'a&\xef')`` yields
463 ``'a%26%EF'``.
464
Georg Brandl7f01a132009-09-16 15:58:14 +0000465
466.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000467
468 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000469 The optional *encoding* and *errors* parameters specify how to decode
470 percent-encoded sequences into Unicode characters, as accepted by the
471 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000472
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000473 *string* must be a :class:`str`.
474
475 *encoding* defaults to ``'utf-8'``.
476 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
477 by a placeholder character.
478
479 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000480
481
Georg Brandl7f01a132009-09-16 15:58:14 +0000482.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000483
Georg Brandl0f7ede42008-06-23 11:23:31 +0000484 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000485 unquoting HTML form values.
486
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000487 *string* must be a :class:`str`.
488
489 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
490
Georg Brandl7f01a132009-09-16 15:58:14 +0000491
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000492.. function:: unquote_to_bytes(string)
493
494 Replace ``%xx`` escapes by their single-octet equivalent, and return a
495 :class:`bytes` object.
496
497 *string* may be either a :class:`str` or a :class:`bytes`.
498
499 If it is a :class:`str`, unescaped non-ASCII characters in *string*
500 are encoded into UTF-8 bytes.
501
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000502 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000503
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000504
Senthil Kumarandf022da2010-07-03 17:48:22 +0000505.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000506
Senthil Kumarandf022da2010-07-03 17:48:22 +0000507 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000508 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded" string,
Senthil Kumarandf022da2010-07-03 17:48:22 +0000509 suitable to pass to :func:`urlopen` above as the optional *data* argument.
510 This is useful to pass a dictionary of form fields to a ``POST`` request.
511 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
512 characters, where both *key* and *value* are quoted using :func:`quote_plus`
513 above. When a sequence of two-element tuples is used as the *query*
514 argument, the first element of each tuple is a key and the second is a
515 value. The value element in itself can be a sequence and in that case, if
516 the optional parameter *doseq* is evaluates to *True*, individual
517 ``key=value`` pairs separated by ``'&'`` are generated for each element of
518 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000519 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000520
521 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000522 parameters are passed down to :func:`quote_plus` for encoding.
523
524 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
525 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000526
527 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000528 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000529
Georg Brandl116aa622007-08-15 14:28:22 +0000530
531.. seealso::
532
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000533 :rfc:`3986` - Uniform Resource Identifiers
534 This is the current standard (STD66). Any changes to urlparse module
535 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000536 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000537 parsing requirements as commonly observed in major browsers.
538
539 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
540 This specifies the parsing requirements of IPv6 URLs.
541
542 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
543 Document describing the generic syntactic requirements for both Uniform Resource
544 Names (URNs) and Uniform Resource Locators (URLs).
545
546 :rfc:`2368` - The mailto URL scheme.
547 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000548
549 :rfc:`1808` - Relative Uniform Resource Locators
550 This Request For Comments includes the rules for joining an absolute and a
551 relative URL, including a fair number of "Abnormal Examples" which govern the
552 treatment of border cases.
553
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000554 :rfc:`1738` - Uniform Resource Locators (URL)
555 This specifies the formal syntax and semantics of absolute URLs.