blob: a6d72672ab1fe4bb5f713a914a06270f0f8a772b [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
15This module defines a standard interface to break Uniform Resource Locator (URL)
16strings up in components (addressing scheme, network location, path etc.), to
17combine the components back into a URL string, and to convert a "relative URL"
18to an absolute URL given a "base URL."
19
20The module has been designed to match the Internet RFC on Relative Uniform
21Resource Locators (and discovered a bug in an earlier draft!). It supports the
22following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
Georg Brandl0f7ede42008-06-23 11:23:31 +000023``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
24``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
25``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000026
Nick Coghlan9fc443c2010-11-30 15:48:08 +000027The :mod:`urllib.parse` module defines functions that fall into two broad
28categories: URL parsing and URL quoting. These are covered in detail in
29the following sections.
30
31URL Parsing
32-----------
33
34The URL parsing functions focus on splitting a URL string into its components,
35or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000036
R. David Murrayf5077aa2010-05-25 15:36:46 +000037.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000038
39 Parse a URL into six components, returning a 6-tuple. This corresponds to the
40 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
41 Each tuple item is a string, possibly empty. The components are not broken up in
42 smaller parts (for example, the network location is a single string), and %
43 escapes are not expanded. The delimiters as shown above are not part of the
44 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000045 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000046
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000047 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000048 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 >>> o # doctest: +NORMALIZE_WHITESPACE
50 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
51 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o.scheme
53 'http'
54 >>> o.port
55 80
56 >>> o.geturl()
57 'http://www.cwi.nl:80/%7Eguido/Python.html'
58
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000059 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
60 a netloc only if it is properly introduced by '//'. Otherwise the
61 input is presumed to be a relative URL and thus to start with
62 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000063
64 >>> from urlparse import urlparse
65 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
66 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
67 params='', query='', fragment='')
68 >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
69 ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html',
70 params='', query='', fragment='')
71 >>> urlparse('help/Python.html')
72 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
73 query='', fragment='')
74
R. David Murrayf5077aa2010-05-25 15:36:46 +000075 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000076 scheme, to be used only if the URL does not specify one. The default value for
77 this argument is the empty string.
78
79 If the *allow_fragments* argument is false, fragment identifiers are not
80 allowed, even if the URL's addressing scheme normally does support them. The
81 default value for this argument is :const:`True`.
82
83 The return value is actually an instance of a subclass of :class:`tuple`. This
84 class has the following additional read-only convenience attributes:
85
86 +------------------+-------+--------------------------+----------------------+
87 | Attribute | Index | Value | Value if not present |
88 +==================+=======+==========================+======================+
89 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
90 +------------------+-------+--------------------------+----------------------+
91 | :attr:`netloc` | 1 | Network location part | empty string |
92 +------------------+-------+--------------------------+----------------------+
93 | :attr:`path` | 2 | Hierarchical path | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`params` | 3 | Parameters for last path | empty string |
96 | | | element | |
97 +------------------+-------+--------------------------+----------------------+
98 | :attr:`query` | 4 | Query component | empty string |
99 +------------------+-------+--------------------------+----------------------+
100 | :attr:`fragment` | 5 | Fragment identifier | empty string |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`username` | | User name | :const:`None` |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`password` | | Password | :const:`None` |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`port` | | Port number as integer, | :const:`None` |
109 | | | if present | |
110 +------------------+-------+--------------------------+----------------------+
111
112 See section :ref:`urlparse-result-object` for more information on the result
113 object.
114
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000115 .. versionchanged:: 3.2
116 Added IPv6 URL parsing capabilities.
117
Georg Brandl116aa622007-08-15 14:28:22 +0000118
Victor Stinnerac71c542011-01-14 12:52:12 +0000119.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000120
121 Parse a query string given as a string argument (data of type
122 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
123 dictionary. The dictionary keys are the unique query variable names and the
124 values are lists of values for each name.
125
126 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000127 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000128 indicates that blanks should be retained as blank strings. The default false
129 value indicates that blank values are to be ignored and treated as if they were
130 not included.
131
132 The optional argument *strict_parsing* is a flag indicating what to do with
133 parsing errors. If false (the default), errors are silently ignored. If true,
134 errors raise a :exc:`ValueError` exception.
135
Victor Stinnerac71c542011-01-14 12:52:12 +0000136 The optional *encoding* and *errors* parameters specify how to decode
137 percent-encoded sequences into Unicode characters, as accepted by the
138 :meth:`bytes.decode` method.
139
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000140 Use the :func:`urllib.parse.urlencode` function to convert such
141 dictionaries into query strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000142
Senthil Kumaran29333122011-02-11 11:25:47 +0000143
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000144 .. versionchanged:: 3.2
145 Add *encoding* and *errors* parameters.
146
Facundo Batistac469d4c2008-09-03 22:49:01 +0000147
Victor Stinnerac71c542011-01-14 12:52:12 +0000148.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000149
150 Parse a query string given as a string argument (data of type
151 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
152 name, value pairs.
153
154 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000155 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000156 indicates that blanks should be retained as blank strings. The default false
157 value indicates that blank values are to be ignored and treated as if they were
158 not included.
159
160 The optional argument *strict_parsing* is a flag indicating what to do with
161 parsing errors. If false (the default), errors are silently ignored. If true,
162 errors raise a :exc:`ValueError` exception.
163
Victor Stinnerac71c542011-01-14 12:52:12 +0000164 The optional *encoding* and *errors* parameters specify how to decode
165 percent-encoded sequences into Unicode characters, as accepted by the
166 :meth:`bytes.decode` method.
167
Facundo Batistac469d4c2008-09-03 22:49:01 +0000168 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
169 query strings.
170
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000171 .. versionchanged:: 3.2
172 Add *encoding* and *errors* parameters.
173
Facundo Batistac469d4c2008-09-03 22:49:01 +0000174
Georg Brandl116aa622007-08-15 14:28:22 +0000175.. function:: urlunparse(parts)
176
Georg Brandl0f7ede42008-06-23 11:23:31 +0000177 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
178 argument can be any six-item iterable. This may result in a slightly
179 different, but equivalent URL, if the URL that was parsed originally had
180 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
181 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000182
183
R. David Murrayf5077aa2010-05-25 15:36:46 +0000184.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000185
186 This is similar to :func:`urlparse`, but does not split the params from the URL.
187 This should generally be used instead of :func:`urlparse` if the more recent URL
188 syntax allowing parameters to be applied to each segment of the *path* portion
189 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
190 separate the path segments and parameters. This function returns a 5-tuple:
191 (addressing scheme, network location, path, query, fragment identifier).
192
193 The return value is actually an instance of a subclass of :class:`tuple`. This
194 class has the following additional read-only convenience attributes:
195
196 +------------------+-------+-------------------------+----------------------+
197 | Attribute | Index | Value | Value if not present |
198 +==================+=======+=========================+======================+
199 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
200 +------------------+-------+-------------------------+----------------------+
201 | :attr:`netloc` | 1 | Network location part | empty string |
202 +------------------+-------+-------------------------+----------------------+
203 | :attr:`path` | 2 | Hierarchical path | empty string |
204 +------------------+-------+-------------------------+----------------------+
205 | :attr:`query` | 3 | Query component | empty string |
206 +------------------+-------+-------------------------+----------------------+
207 | :attr:`fragment` | 4 | Fragment identifier | empty string |
208 +------------------+-------+-------------------------+----------------------+
209 | :attr:`username` | | User name | :const:`None` |
210 +------------------+-------+-------------------------+----------------------+
211 | :attr:`password` | | Password | :const:`None` |
212 +------------------+-------+-------------------------+----------------------+
213 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
214 +------------------+-------+-------------------------+----------------------+
215 | :attr:`port` | | Port number as integer, | :const:`None` |
216 | | | if present | |
217 +------------------+-------+-------------------------+----------------------+
218
219 See section :ref:`urlparse-result-object` for more information on the result
220 object.
221
Georg Brandl116aa622007-08-15 14:28:22 +0000222
223.. function:: urlunsplit(parts)
224
Georg Brandl0f7ede42008-06-23 11:23:31 +0000225 Combine the elements of a tuple as returned by :func:`urlsplit` into a
226 complete URL as a string. The *parts* argument can be any five-item
227 iterable. This may result in a slightly different, but equivalent URL, if the
228 URL that was parsed originally had unnecessary delimiters (for example, a ?
229 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000230
Georg Brandl116aa622007-08-15 14:28:22 +0000231
Georg Brandl7f01a132009-09-16 15:58:14 +0000232.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000233
234 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
235 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000236 particular the addressing scheme, the network location and (part of) the
237 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000238
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000239 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000240 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
241 'http://www.cwi.nl/%7Eguido/FAQ.html'
242
243 The *allow_fragments* argument has the same meaning and default as for
244 :func:`urlparse`.
245
246 .. note::
247
248 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
249 the *url*'s host name and/or scheme will be present in the result. For example:
250
Christian Heimesfe337bf2008-03-23 21:54:12 +0000251 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000252
253 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
254 ... '//www.python.org/%7Eguido')
255 'http://www.python.org/%7Eguido'
256
257 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
258 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
259
260
261.. function:: urldefrag(url)
262
Georg Brandl0f7ede42008-06-23 11:23:31 +0000263 If *url* contains a fragment identifier, return a modified version of *url*
264 with no fragment identifier, and the fragment identifier as a separate
265 string. If there is no fragment identifier in *url*, return *url* unmodified
266 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000267
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000268 The return value is actually an instance of a subclass of :class:`tuple`. This
269 class has the following additional read-only convenience attributes:
270
271 +------------------+-------+-------------------------+----------------------+
272 | Attribute | Index | Value | Value if not present |
273 +==================+=======+=========================+======================+
274 | :attr:`url` | 0 | URL with no fragment | empty string |
275 +------------------+-------+-------------------------+----------------------+
276 | :attr:`fragment` | 1 | Fragment identifier | empty string |
277 +------------------+-------+-------------------------+----------------------+
278
279 See section :ref:`urlparse-result-object` for more information on the result
280 object.
281
282 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000283 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000284
Georg Brandl009a6bd2011-01-24 19:59:08 +0000285.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000286
287Parsing ASCII Encoded Bytes
288---------------------------
289
290The URL parsing functions were originally designed to operate on character
291strings only. In practice, it is useful to be able to manipulate properly
292quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
293URL parsing functions in this module all operate on :class:`bytes` and
294:class:`bytearray` objects in addition to :class:`str` objects.
295
296If :class:`str` data is passed in, the result will also contain only
297:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
298passed in, the result will contain only :class:`bytes` data.
299
300Attempting to mix :class:`str` data with :class:`bytes` or
301:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000302:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000303byte values will trigger :exc:`UnicodeDecodeError`.
304
305To support easier conversion of result objects between :class:`str` and
306:class:`bytes`, all return values from URL parsing functions provide
307either an :meth:`encode` method (when the result contains :class:`str`
308data) or a :meth:`decode` method (when the result contains :class:`bytes`
309data). The signatures of these methods match those of the corresponding
310:class:`str` and :class:`bytes` methods (except that the default encoding
311is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
312corresponding type that contains either :class:`bytes` data (for
313:meth:`encode` methods) or :class:`str` data (for
314:meth:`decode` methods).
315
316Applications that need to operate on potentially improperly quoted URLs
317that may contain non-ASCII data will need to do their own decoding from
318bytes to characters before invoking the URL parsing methods.
319
320The behaviour described in this section applies only to the URL parsing
321functions. The URL quoting functions use their own rules when producing
322or consuming byte sequences as detailed in the documentation of the
323individual URL quoting functions.
324
325.. versionchanged:: 3.2
326 URL parsing functions now accept ASCII encoded byte sequences
327
328
329.. _urlparse-result-object:
330
331Structured Parse Results
332------------------------
333
334The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000335:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000336These subclasses add the attributes listed in the documentation for
337those functions, the encoding and decoding support described in the
338previous section, as well as an additional method:
339
340.. method:: urllib.parse.SplitResult.geturl()
341
342 Return the re-combined version of the original URL as a string. This may
343 differ from the original URL in that the scheme may be normalized to lower
344 case and empty components may be dropped. Specifically, empty parameters,
345 queries, and fragment identifiers will be removed.
346
347 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
348 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
349 made to the URL returned by this method.
350
351 The result of this method remains unchanged if passed back through the original
352 parsing function:
353
354 >>> from urllib.parse import urlsplit
355 >>> url = 'HTTP://www.Python.org/doc/#'
356 >>> r1 = urlsplit(url)
357 >>> r1.geturl()
358 'http://www.Python.org/doc/'
359 >>> r2 = urlsplit(r1.geturl())
360 >>> r2.geturl()
361 'http://www.Python.org/doc/'
362
363
364The following classes provide the implementations of the structured parse
365results when operating on :class:`str` objects:
366
367.. class:: DefragResult(url, fragment)
368
369 Concrete class for :func:`urldefrag` results containing :class:`str`
370 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
371 instance.
372
373 .. versionadded:: 3.2
374
375.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
376
377 Concrete class for :func:`urlparse` results containing :class:`str`
378 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
379 instance.
380
381.. class:: SplitResult(scheme, netloc, path, query, fragment)
382
383 Concrete class for :func:`urlsplit` results containing :class:`str`
384 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
385 instance.
386
387
388The following classes provide the implementations of the parse results when
389operating on :class:`bytes` or :class:`bytearray` objects:
390
391.. class:: DefragResultBytes(url, fragment)
392
393 Concrete class for :func:`urldefrag` results containing :class:`bytes`
394 data. The :meth:`decode` method returns a :class:`DefragResult`
395 instance.
396
397 .. versionadded:: 3.2
398
399.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
400
401 Concrete class for :func:`urlparse` results containing :class:`bytes`
402 data. The :meth:`decode` method returns a :class:`ParseResult`
403 instance.
404
405 .. versionadded:: 3.2
406
407.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
408
409 Concrete class for :func:`urlsplit` results containing :class:`bytes`
410 data. The :meth:`decode` method returns a :class:`SplitResult`
411 instance.
412
413 .. versionadded:: 3.2
414
415
416URL Quoting
417-----------
418
419The URL quoting functions focus on taking program data and making it safe
420for use as URL components by quoting special characters and appropriately
421encoding non-ASCII text. They also support reversing these operations to
422recreate the original data from the contents of a URL component if that
423task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000424
425.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000426
427 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000428 digits, and the characters ``'_.-'`` are never quoted. By default, this
429 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000430 parameter specifies additional ASCII characters that should not be quoted
431 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000432
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000433 *string* may be either a :class:`str` or a :class:`bytes`.
434
435 The optional *encoding* and *errors* parameters specify how to deal with
436 non-ASCII characters, as accepted by the :meth:`str.encode` method.
437 *encoding* defaults to ``'utf-8'``.
438 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
439 :class:`UnicodeEncodeError`.
440 *encoding* and *errors* must not be supplied if *string* is a
441 :class:`bytes`, or a :class:`TypeError` is raised.
442
443 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
444 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
445
446 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000447
448
Georg Brandl7f01a132009-09-16 15:58:14 +0000449.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000450
Georg Brandl0f7ede42008-06-23 11:23:31 +0000451 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000452 quoting HTML form values when building up a query string to go into a URL.
453 Plus signs in the original string are escaped unless they are included in
454 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000455
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000456 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000457
Georg Brandl7f01a132009-09-16 15:58:14 +0000458
459.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000460
461 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
462 :class:`str`, and does not perform string-to-bytes encoding.
463
464 Example: ``quote_from_bytes(b'a&\xef')`` yields
465 ``'a%26%EF'``.
466
Georg Brandl7f01a132009-09-16 15:58:14 +0000467
468.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000469
470 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000471 The optional *encoding* and *errors* parameters specify how to decode
472 percent-encoded sequences into Unicode characters, as accepted by the
473 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000474
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000475 *string* must be a :class:`str`.
476
477 *encoding* defaults to ``'utf-8'``.
478 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
479 by a placeholder character.
480
481 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000482
483
Georg Brandl7f01a132009-09-16 15:58:14 +0000484.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000485
Georg Brandl0f7ede42008-06-23 11:23:31 +0000486 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000487 unquoting HTML form values.
488
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000489 *string* must be a :class:`str`.
490
491 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
492
Georg Brandl7f01a132009-09-16 15:58:14 +0000493
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000494.. function:: unquote_to_bytes(string)
495
496 Replace ``%xx`` escapes by their single-octet equivalent, and return a
497 :class:`bytes` object.
498
499 *string* may be either a :class:`str` or a :class:`bytes`.
500
501 If it is a :class:`str`, unescaped non-ASCII characters in *string*
502 are encoded into UTF-8 bytes.
503
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000504 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000505
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000506
Senthil Kumarandf022da2010-07-03 17:48:22 +0000507.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000508
Senthil Kumarandf022da2010-07-03 17:48:22 +0000509 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaran29333122011-02-11 11:25:47 +0000510 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded"
511 string. The resultant string must be converted to bytes using the
512 user-specified encoding before it is sent to :func:`urlopen` as the optional
513 *data* argument.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000514 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
515 characters, where both *key* and *value* are quoted using :func:`quote_plus`
516 above. When a sequence of two-element tuples is used as the *query*
517 argument, the first element of each tuple is a key and the second is a
518 value. The value element in itself can be a sequence and in that case, if
519 the optional parameter *doseq* is evaluates to *True*, individual
520 ``key=value`` pairs separated by ``'&'`` are generated for each element of
521 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000522 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000523
524 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000525 parameters are passed down to :func:`quote_plus` for encoding.
526
527 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
528 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000529
Senthil Kumaran29333122011-02-11 11:25:47 +0000530 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
531 method can be used for generating query string for a URL or data for POST.
532
Senthil Kumarandf022da2010-07-03 17:48:22 +0000533 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000534 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000535
Georg Brandl116aa622007-08-15 14:28:22 +0000536
537.. seealso::
538
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000539 :rfc:`3986` - Uniform Resource Identifiers
540 This is the current standard (STD66). Any changes to urlparse module
541 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000542 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000543 parsing requirements as commonly observed in major browsers.
544
545 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
546 This specifies the parsing requirements of IPv6 URLs.
547
548 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
549 Document describing the generic syntactic requirements for both Uniform Resource
550 Names (URNs) and Uniform Resource Locators (URLs).
551
552 :rfc:`2368` - The mailto URL scheme.
553 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000554
555 :rfc:`1808` - Relative Uniform Resource Locators
556 This Request For Comments includes the rules for joining an absolute and a
557 relative URL, including a fair number of "Abnormal Examples" which govern the
558 treatment of border cases.
559
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000560 :rfc:`1738` - Uniform Resource Locators (URL)
561 This specifies the formal syntax and semantics of absolute URLs.