blob: 6a143c5fe579820e163364e907737e77ff82ea2e [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
15This module defines a standard interface to break Uniform Resource Locator (URL)
16strings up in components (addressing scheme, network location, path etc.), to
17combine the components back into a URL string, and to convert a "relative URL"
18to an absolute URL given a "base URL."
19
20The module has been designed to match the Internet RFC on Relative Uniform
21Resource Locators (and discovered a bug in an earlier draft!). It supports the
22following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
Georg Brandl0f7ede42008-06-23 11:23:31 +000023``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
24``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
25``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000026
Nick Coghlan9fc443c2010-11-30 15:48:08 +000027The :mod:`urllib.parse` module defines functions that fall into two broad
28categories: URL parsing and URL quoting. These are covered in detail in
29the following sections.
30
31URL Parsing
32-----------
33
34The URL parsing functions focus on splitting a URL string into its components,
35or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000036
R. David Murrayf5077aa2010-05-25 15:36:46 +000037.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000038
39 Parse a URL into six components, returning a 6-tuple. This corresponds to the
40 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
41 Each tuple item is a string, possibly empty. The components are not broken up in
42 smaller parts (for example, the network location is a single string), and %
43 escapes are not expanded. The delimiters as shown above are not part of the
44 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000045 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000046
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000047 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000048 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 >>> o # doctest: +NORMALIZE_WHITESPACE
50 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
51 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o.scheme
53 'http'
54 >>> o.port
55 80
56 >>> o.geturl()
57 'http://www.cwi.nl:80/%7Eguido/Python.html'
58
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000059 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
60 a netloc only if it is properly introduced by '//'. Otherwise the
61 input is presumed to be a relative URL and thus to start with
62 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000063
64 >>> from urlparse import urlparse
65 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
66 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
67 params='', query='', fragment='')
68 >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
69 ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html',
70 params='', query='', fragment='')
71 >>> urlparse('help/Python.html')
72 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
73 query='', fragment='')
74
R. David Murrayf5077aa2010-05-25 15:36:46 +000075 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000076 scheme, to be used only if the URL does not specify one. The default value for
77 this argument is the empty string.
78
79 If the *allow_fragments* argument is false, fragment identifiers are not
80 allowed, even if the URL's addressing scheme normally does support them. The
81 default value for this argument is :const:`True`.
82
83 The return value is actually an instance of a subclass of :class:`tuple`. This
84 class has the following additional read-only convenience attributes:
85
86 +------------------+-------+--------------------------+----------------------+
87 | Attribute | Index | Value | Value if not present |
88 +==================+=======+==========================+======================+
89 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
90 +------------------+-------+--------------------------+----------------------+
91 | :attr:`netloc` | 1 | Network location part | empty string |
92 +------------------+-------+--------------------------+----------------------+
93 | :attr:`path` | 2 | Hierarchical path | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`params` | 3 | Parameters for last path | empty string |
96 | | | element | |
97 +------------------+-------+--------------------------+----------------------+
98 | :attr:`query` | 4 | Query component | empty string |
99 +------------------+-------+--------------------------+----------------------+
100 | :attr:`fragment` | 5 | Fragment identifier | empty string |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`username` | | User name | :const:`None` |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`password` | | Password | :const:`None` |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`port` | | Port number as integer, | :const:`None` |
109 | | | if present | |
110 +------------------+-------+--------------------------+----------------------+
111
112 See section :ref:`urlparse-result-object` for more information on the result
113 object.
114
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000115 .. versionchanged:: 3.2
116 Added IPv6 URL parsing capabilities.
117
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Victor Stinnerac71c542011-01-14 12:52:12 +0000119.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000120
121 Parse a query string given as a string argument (data of type
122 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
123 dictionary. The dictionary keys are the unique query variable names and the
124 values are lists of values for each name.
125
126 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000127 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000128 indicates that blanks should be retained as blank strings. The default false
129 value indicates that blank values are to be ignored and treated as if they were
130 not included.
131
132 The optional argument *strict_parsing* is a flag indicating what to do with
133 parsing errors. If false (the default), errors are silently ignored. If true,
134 errors raise a :exc:`ValueError` exception.
135
Victor Stinnerac71c542011-01-14 12:52:12 +0000136 The optional *encoding* and *errors* parameters specify how to decode
137 percent-encoded sequences into Unicode characters, as accepted by the
138 :meth:`bytes.decode` method.
139
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000140 Use the :func:`urllib.parse.urlencode` function to convert such
141 dictionaries into query strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000142
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000143 .. versionchanged:: 3.2
144 Add *encoding* and *errors* parameters.
145
Facundo Batistac469d4c2008-09-03 22:49:01 +0000146
Victor Stinnerac71c542011-01-14 12:52:12 +0000147.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000148
149 Parse a query string given as a string argument (data of type
150 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
151 name, value pairs.
152
153 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000154 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000155 indicates that blanks should be retained as blank strings. The default false
156 value indicates that blank values are to be ignored and treated as if they were
157 not included.
158
159 The optional argument *strict_parsing* is a flag indicating what to do with
160 parsing errors. If false (the default), errors are silently ignored. If true,
161 errors raise a :exc:`ValueError` exception.
162
Victor Stinnerac71c542011-01-14 12:52:12 +0000163 The optional *encoding* and *errors* parameters specify how to decode
164 percent-encoded sequences into Unicode characters, as accepted by the
165 :meth:`bytes.decode` method.
166
Facundo Batistac469d4c2008-09-03 22:49:01 +0000167 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
168 query strings.
169
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000170 .. versionchanged:: 3.2
171 Add *encoding* and *errors* parameters.
172
Facundo Batistac469d4c2008-09-03 22:49:01 +0000173
Georg Brandl116aa622007-08-15 14:28:22 +0000174.. function:: urlunparse(parts)
175
Georg Brandl0f7ede42008-06-23 11:23:31 +0000176 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
177 argument can be any six-item iterable. This may result in a slightly
178 different, but equivalent URL, if the URL that was parsed originally had
179 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
180 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000181
182
R. David Murrayf5077aa2010-05-25 15:36:46 +0000183.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000184
185 This is similar to :func:`urlparse`, but does not split the params from the URL.
186 This should generally be used instead of :func:`urlparse` if the more recent URL
187 syntax allowing parameters to be applied to each segment of the *path* portion
188 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
189 separate the path segments and parameters. This function returns a 5-tuple:
190 (addressing scheme, network location, path, query, fragment identifier).
191
192 The return value is actually an instance of a subclass of :class:`tuple`. This
193 class has the following additional read-only convenience attributes:
194
195 +------------------+-------+-------------------------+----------------------+
196 | Attribute | Index | Value | Value if not present |
197 +==================+=======+=========================+======================+
198 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
199 +------------------+-------+-------------------------+----------------------+
200 | :attr:`netloc` | 1 | Network location part | empty string |
201 +------------------+-------+-------------------------+----------------------+
202 | :attr:`path` | 2 | Hierarchical path | empty string |
203 +------------------+-------+-------------------------+----------------------+
204 | :attr:`query` | 3 | Query component | empty string |
205 +------------------+-------+-------------------------+----------------------+
206 | :attr:`fragment` | 4 | Fragment identifier | empty string |
207 +------------------+-------+-------------------------+----------------------+
208 | :attr:`username` | | User name | :const:`None` |
209 +------------------+-------+-------------------------+----------------------+
210 | :attr:`password` | | Password | :const:`None` |
211 +------------------+-------+-------------------------+----------------------+
212 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
213 +------------------+-------+-------------------------+----------------------+
214 | :attr:`port` | | Port number as integer, | :const:`None` |
215 | | | if present | |
216 +------------------+-------+-------------------------+----------------------+
217
218 See section :ref:`urlparse-result-object` for more information on the result
219 object.
220
Georg Brandl116aa622007-08-15 14:28:22 +0000221
222.. function:: urlunsplit(parts)
223
Georg Brandl0f7ede42008-06-23 11:23:31 +0000224 Combine the elements of a tuple as returned by :func:`urlsplit` into a
225 complete URL as a string. The *parts* argument can be any five-item
226 iterable. This may result in a slightly different, but equivalent URL, if the
227 URL that was parsed originally had unnecessary delimiters (for example, a ?
228 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000229
Georg Brandl116aa622007-08-15 14:28:22 +0000230
Georg Brandl7f01a132009-09-16 15:58:14 +0000231.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000232
233 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
234 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000235 particular the addressing scheme, the network location and (part of) the
236 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000237
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000238 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000239 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
240 'http://www.cwi.nl/%7Eguido/FAQ.html'
241
242 The *allow_fragments* argument has the same meaning and default as for
243 :func:`urlparse`.
244
245 .. note::
246
247 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
248 the *url*'s host name and/or scheme will be present in the result. For example:
249
Christian Heimesfe337bf2008-03-23 21:54:12 +0000250 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000251
252 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
253 ... '//www.python.org/%7Eguido')
254 'http://www.python.org/%7Eguido'
255
256 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
257 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
258
259
260.. function:: urldefrag(url)
261
Georg Brandl0f7ede42008-06-23 11:23:31 +0000262 If *url* contains a fragment identifier, return a modified version of *url*
263 with no fragment identifier, and the fragment identifier as a separate
264 string. If there is no fragment identifier in *url*, return *url* unmodified
265 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000266
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000267 The return value is actually an instance of a subclass of :class:`tuple`. This
268 class has the following additional read-only convenience attributes:
269
270 +------------------+-------+-------------------------+----------------------+
271 | Attribute | Index | Value | Value if not present |
272 +==================+=======+=========================+======================+
273 | :attr:`url` | 0 | URL with no fragment | empty string |
274 +------------------+-------+-------------------------+----------------------+
275 | :attr:`fragment` | 1 | Fragment identifier | empty string |
276 +------------------+-------+-------------------------+----------------------+
277
278 See section :ref:`urlparse-result-object` for more information on the result
279 object.
280
281 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000282 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000283
Georg Brandl009a6bd2011-01-24 19:59:08 +0000284.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000285
286Parsing ASCII Encoded Bytes
287---------------------------
288
289The URL parsing functions were originally designed to operate on character
290strings only. In practice, it is useful to be able to manipulate properly
291quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
292URL parsing functions in this module all operate on :class:`bytes` and
293:class:`bytearray` objects in addition to :class:`str` objects.
294
295If :class:`str` data is passed in, the result will also contain only
296:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
297passed in, the result will contain only :class:`bytes` data.
298
299Attempting to mix :class:`str` data with :class:`bytes` or
300:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000301:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000302byte values will trigger :exc:`UnicodeDecodeError`.
303
304To support easier conversion of result objects between :class:`str` and
305:class:`bytes`, all return values from URL parsing functions provide
306either an :meth:`encode` method (when the result contains :class:`str`
307data) or a :meth:`decode` method (when the result contains :class:`bytes`
308data). The signatures of these methods match those of the corresponding
309:class:`str` and :class:`bytes` methods (except that the default encoding
310is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
311corresponding type that contains either :class:`bytes` data (for
312:meth:`encode` methods) or :class:`str` data (for
313:meth:`decode` methods).
314
315Applications that need to operate on potentially improperly quoted URLs
316that may contain non-ASCII data will need to do their own decoding from
317bytes to characters before invoking the URL parsing methods.
318
319The behaviour described in this section applies only to the URL parsing
320functions. The URL quoting functions use their own rules when producing
321or consuming byte sequences as detailed in the documentation of the
322individual URL quoting functions.
323
324.. versionchanged:: 3.2
325 URL parsing functions now accept ASCII encoded byte sequences
326
327
328.. _urlparse-result-object:
329
330Structured Parse Results
331------------------------
332
333The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000334:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000335These subclasses add the attributes listed in the documentation for
336those functions, the encoding and decoding support described in the
337previous section, as well as an additional method:
338
339.. method:: urllib.parse.SplitResult.geturl()
340
341 Return the re-combined version of the original URL as a string. This may
342 differ from the original URL in that the scheme may be normalized to lower
343 case and empty components may be dropped. Specifically, empty parameters,
344 queries, and fragment identifiers will be removed.
345
346 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
347 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
348 made to the URL returned by this method.
349
350 The result of this method remains unchanged if passed back through the original
351 parsing function:
352
353 >>> from urllib.parse import urlsplit
354 >>> url = 'HTTP://www.Python.org/doc/#'
355 >>> r1 = urlsplit(url)
356 >>> r1.geturl()
357 'http://www.Python.org/doc/'
358 >>> r2 = urlsplit(r1.geturl())
359 >>> r2.geturl()
360 'http://www.Python.org/doc/'
361
362
363The following classes provide the implementations of the structured parse
364results when operating on :class:`str` objects:
365
366.. class:: DefragResult(url, fragment)
367
368 Concrete class for :func:`urldefrag` results containing :class:`str`
369 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
370 instance.
371
372 .. versionadded:: 3.2
373
374.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
375
376 Concrete class for :func:`urlparse` results containing :class:`str`
377 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
378 instance.
379
380.. class:: SplitResult(scheme, netloc, path, query, fragment)
381
382 Concrete class for :func:`urlsplit` results containing :class:`str`
383 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
384 instance.
385
386
387The following classes provide the implementations of the parse results when
388operating on :class:`bytes` or :class:`bytearray` objects:
389
390.. class:: DefragResultBytes(url, fragment)
391
392 Concrete class for :func:`urldefrag` results containing :class:`bytes`
393 data. The :meth:`decode` method returns a :class:`DefragResult`
394 instance.
395
396 .. versionadded:: 3.2
397
398.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
399
400 Concrete class for :func:`urlparse` results containing :class:`bytes`
401 data. The :meth:`decode` method returns a :class:`ParseResult`
402 instance.
403
404 .. versionadded:: 3.2
405
406.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
407
408 Concrete class for :func:`urlsplit` results containing :class:`bytes`
409 data. The :meth:`decode` method returns a :class:`SplitResult`
410 instance.
411
412 .. versionadded:: 3.2
413
414
415URL Quoting
416-----------
417
418The URL quoting functions focus on taking program data and making it safe
419for use as URL components by quoting special characters and appropriately
420encoding non-ASCII text. They also support reversing these operations to
421recreate the original data from the contents of a URL component if that
422task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000423
424.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000425
426 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000427 digits, and the characters ``'_.-'`` are never quoted. By default, this
428 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000429 parameter specifies additional ASCII characters that should not be quoted
430 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000431
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000432 *string* may be either a :class:`str` or a :class:`bytes`.
433
434 The optional *encoding* and *errors* parameters specify how to deal with
435 non-ASCII characters, as accepted by the :meth:`str.encode` method.
436 *encoding* defaults to ``'utf-8'``.
437 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
438 :class:`UnicodeEncodeError`.
439 *encoding* and *errors* must not be supplied if *string* is a
440 :class:`bytes`, or a :class:`TypeError` is raised.
441
442 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
443 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
444
445 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000446
447
Georg Brandl7f01a132009-09-16 15:58:14 +0000448.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000449
Georg Brandl0f7ede42008-06-23 11:23:31 +0000450 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000451 quoting HTML form values when building up a query string to go into a URL.
452 Plus signs in the original string are escaped unless they are included in
453 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000454
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000455 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000456
Georg Brandl7f01a132009-09-16 15:58:14 +0000457
458.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000459
460 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
461 :class:`str`, and does not perform string-to-bytes encoding.
462
463 Example: ``quote_from_bytes(b'a&\xef')`` yields
464 ``'a%26%EF'``.
465
Georg Brandl7f01a132009-09-16 15:58:14 +0000466
467.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000468
469 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000470 The optional *encoding* and *errors* parameters specify how to decode
471 percent-encoded sequences into Unicode characters, as accepted by the
472 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000473
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000474 *string* must be a :class:`str`.
475
476 *encoding* defaults to ``'utf-8'``.
477 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
478 by a placeholder character.
479
480 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000481
482
Georg Brandl7f01a132009-09-16 15:58:14 +0000483.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000484
Georg Brandl0f7ede42008-06-23 11:23:31 +0000485 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000486 unquoting HTML form values.
487
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000488 *string* must be a :class:`str`.
489
490 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
491
Georg Brandl7f01a132009-09-16 15:58:14 +0000492
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000493.. function:: unquote_to_bytes(string)
494
495 Replace ``%xx`` escapes by their single-octet equivalent, and return a
496 :class:`bytes` object.
497
498 *string* may be either a :class:`str` or a :class:`bytes`.
499
500 If it is a :class:`str`, unescaped non-ASCII characters in *string*
501 are encoded into UTF-8 bytes.
502
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000503 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000504
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000505
Senthil Kumarandf022da2010-07-03 17:48:22 +0000506.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000507
Senthil Kumarandf022da2010-07-03 17:48:22 +0000508 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000509 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded" string,
Senthil Kumarandf022da2010-07-03 17:48:22 +0000510 suitable to pass to :func:`urlopen` above as the optional *data* argument.
511 This is useful to pass a dictionary of form fields to a ``POST`` request.
512 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
513 characters, where both *key* and *value* are quoted using :func:`quote_plus`
514 above. When a sequence of two-element tuples is used as the *query*
515 argument, the first element of each tuple is a key and the second is a
516 value. The value element in itself can be a sequence and in that case, if
517 the optional parameter *doseq* is evaluates to *True*, individual
518 ``key=value`` pairs separated by ``'&'`` are generated for each element of
519 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000520 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000521
522 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000523 parameters are passed down to :func:`quote_plus` for encoding.
524
525 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
526 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000527
528 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000529 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000530
Georg Brandl116aa622007-08-15 14:28:22 +0000531
532.. seealso::
533
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000534 :rfc:`3986` - Uniform Resource Identifiers
535 This is the current standard (STD66). Any changes to urlparse module
536 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000537 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000538 parsing requirements as commonly observed in major browsers.
539
540 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
541 This specifies the parsing requirements of IPv6 URLs.
542
543 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
544 Document describing the generic syntactic requirements for both Uniform Resource
545 Names (URNs) and Uniform Resource Locators (URLs).
546
547 :rfc:`2368` - The mailto URL scheme.
548 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000549
550 :rfc:`1808` - Relative Uniform Resource Locators
551 This Request For Comments includes the rules for joining an absolute and a
552 relative URL, including a fair number of "Abnormal Examples" which govern the
553 treatment of border cases.
554
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000555 :rfc:`1738` - Uniform Resource Locators (URL)
556 This specifies the formal syntax and semantics of absolute URLs.