blob: b33e8fe1ed7badcada30fa38e94364f2080b85f6 [file] [log] [blame]
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00001:mod:`urllib.parse` --- Parse URLs into components
2==================================================
Georg Brandl116aa622007-08-15 14:28:22 +00003
Senthil Kumaranaca8fd72008-06-23 04:41:59 +00004.. module:: urllib.parse
Georg Brandl116aa622007-08-15 14:28:22 +00005 :synopsis: Parse URLs into or assemble them from components.
6
7
8.. index::
9 single: WWW
10 single: World Wide Web
11 single: URL
12 pair: URL; parsing
13 pair: relative; URL
14
Éric Araujo19f9b712011-08-19 00:49:18 +020015**Source code:** :source:`Lib/urllib/parse.py`
16
17--------------
18
Georg Brandl116aa622007-08-15 14:28:22 +000019This module defines a standard interface to break Uniform Resource Locator (URL)
20strings up in components (addressing scheme, network location, path etc.), to
21combine the components back into a URL string, and to convert a "relative URL"
22to an absolute URL given a "base URL."
23
24The module has been designed to match the Internet RFC on Relative Uniform
25Resource Locators (and discovered a bug in an earlier draft!). It supports the
26following URL schemes: ``file``, ``ftp``, ``gopher``, ``hdl``, ``http``,
Georg Brandl0f7ede42008-06-23 11:23:31 +000027``https``, ``imap``, ``mailto``, ``mms``, ``news``, ``nntp``, ``prospero``,
28``rsync``, ``rtsp``, ``rtspu``, ``sftp``, ``shttp``, ``sip``, ``sips``,
29``snews``, ``svn``, ``svn+ssh``, ``telnet``, ``wais``.
Georg Brandl116aa622007-08-15 14:28:22 +000030
Nick Coghlan9fc443c2010-11-30 15:48:08 +000031The :mod:`urllib.parse` module defines functions that fall into two broad
32categories: URL parsing and URL quoting. These are covered in detail in
33the following sections.
34
35URL Parsing
36-----------
37
38The URL parsing functions focus on splitting a URL string into its components,
39or on combining URL components into a URL string.
Georg Brandl116aa622007-08-15 14:28:22 +000040
R. David Murrayf5077aa2010-05-25 15:36:46 +000041.. function:: urlparse(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +000042
43 Parse a URL into six components, returning a 6-tuple. This corresponds to the
44 general structure of a URL: ``scheme://netloc/path;parameters?query#fragment``.
45 Each tuple item is a string, possibly empty. The components are not broken up in
46 smaller parts (for example, the network location is a single string), and %
47 escapes are not expanded. The delimiters as shown above are not part of the
48 result, except for a leading slash in the *path* component, which is retained if
Christian Heimesfe337bf2008-03-23 21:54:12 +000049 present. For example:
Georg Brandl116aa622007-08-15 14:28:22 +000050
Senthil Kumaranaca8fd72008-06-23 04:41:59 +000051 >>> from urllib.parse import urlparse
Georg Brandl116aa622007-08-15 14:28:22 +000052 >>> o = urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
Christian Heimesfe337bf2008-03-23 21:54:12 +000053 >>> o # doctest: +NORMALIZE_WHITESPACE
54 ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
55 params='', query='', fragment='')
Georg Brandl116aa622007-08-15 14:28:22 +000056 >>> o.scheme
57 'http'
58 >>> o.port
59 80
60 >>> o.geturl()
61 'http://www.cwi.nl:80/%7Eguido/Python.html'
62
Senthil Kumaran7089a4e2010-11-07 12:57:04 +000063 Following the syntax specifications in :rfc:`1808`, urlparse recognizes
64 a netloc only if it is properly introduced by '//'. Otherwise the
65 input is presumed to be a relative URL and thus to start with
66 a path component.
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000067
Senthil Kumaranfe9230a2011-06-19 13:52:49 -070068 >>> from urllib.parse import urlparse
Senthil Kumaran84c7d9f2010-08-04 04:50:44 +000069 >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html')
70 ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
71 params='', query='', fragment='')
72 >>> urlparse('www.cwi.nl:80/%7Eguido/Python.html')
73 ParseResult(scheme='', netloc='', path='www.cwi.nl:80/%7Eguido/Python.html',
74 params='', query='', fragment='')
75 >>> urlparse('help/Python.html')
76 ParseResult(scheme='', netloc='', path='help/Python.html', params='',
77 query='', fragment='')
78
R. David Murrayf5077aa2010-05-25 15:36:46 +000079 If the *scheme* argument is specified, it gives the default addressing
Georg Brandl116aa622007-08-15 14:28:22 +000080 scheme, to be used only if the URL does not specify one. The default value for
81 this argument is the empty string.
82
83 If the *allow_fragments* argument is false, fragment identifiers are not
84 allowed, even if the URL's addressing scheme normally does support them. The
85 default value for this argument is :const:`True`.
86
87 The return value is actually an instance of a subclass of :class:`tuple`. This
88 class has the following additional read-only convenience attributes:
89
90 +------------------+-------+--------------------------+----------------------+
91 | Attribute | Index | Value | Value if not present |
92 +==================+=======+==========================+======================+
93 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
94 +------------------+-------+--------------------------+----------------------+
95 | :attr:`netloc` | 1 | Network location part | empty string |
96 +------------------+-------+--------------------------+----------------------+
97 | :attr:`path` | 2 | Hierarchical path | empty string |
98 +------------------+-------+--------------------------+----------------------+
99 | :attr:`params` | 3 | Parameters for last path | empty string |
100 | | | element | |
101 +------------------+-------+--------------------------+----------------------+
102 | :attr:`query` | 4 | Query component | empty string |
103 +------------------+-------+--------------------------+----------------------+
104 | :attr:`fragment` | 5 | Fragment identifier | empty string |
105 +------------------+-------+--------------------------+----------------------+
106 | :attr:`username` | | User name | :const:`None` |
107 +------------------+-------+--------------------------+----------------------+
108 | :attr:`password` | | Password | :const:`None` |
109 +------------------+-------+--------------------------+----------------------+
110 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
111 +------------------+-------+--------------------------+----------------------+
112 | :attr:`port` | | Port number as integer, | :const:`None` |
113 | | | if present | |
114 +------------------+-------+--------------------------+----------------------+
115
116 See section :ref:`urlparse-result-object` for more information on the result
117 object.
118
Senthil Kumaran7a1e09f2010-04-22 12:19:46 +0000119 .. versionchanged:: 3.2
120 Added IPv6 URL parsing capabilities.
121
Georg Brandl116aa622007-08-15 14:28:22 +0000122
Victor Stinnerac71c542011-01-14 12:52:12 +0000123.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000124
125 Parse a query string given as a string argument (data of type
126 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a
127 dictionary. The dictionary keys are the unique query variable names and the
128 values are lists of values for each name.
129
130 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000131 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000132 indicates that blanks should be retained as blank strings. The default false
133 value indicates that blank values are to be ignored and treated as if they were
134 not included.
135
136 The optional argument *strict_parsing* is a flag indicating what to do with
137 parsing errors. If false (the default), errors are silently ignored. If true,
138 errors raise a :exc:`ValueError` exception.
139
Victor Stinnerac71c542011-01-14 12:52:12 +0000140 The optional *encoding* and *errors* parameters specify how to decode
141 percent-encoded sequences into Unicode characters, as accepted by the
142 :meth:`bytes.decode` method.
143
Georg Brandl7fe2c4a2008-12-05 07:32:56 +0000144 Use the :func:`urllib.parse.urlencode` function to convert such
145 dictionaries into query strings.
Facundo Batistac469d4c2008-09-03 22:49:01 +0000146
Senthil Kumaran29333122011-02-11 11:25:47 +0000147
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000148 .. versionchanged:: 3.2
149 Add *encoding* and *errors* parameters.
150
Facundo Batistac469d4c2008-09-03 22:49:01 +0000151
Victor Stinnerac71c542011-01-14 12:52:12 +0000152.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace')
Facundo Batistac469d4c2008-09-03 22:49:01 +0000153
154 Parse a query string given as a string argument (data of type
155 :mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
156 name, value pairs.
157
158 The optional argument *keep_blank_values* is a flag indicating whether blank
Senthil Kumaranf0769e82010-08-09 19:53:52 +0000159 values in percent-encoded queries should be treated as blank strings. A true value
Facundo Batistac469d4c2008-09-03 22:49:01 +0000160 indicates that blanks should be retained as blank strings. The default false
161 value indicates that blank values are to be ignored and treated as if they were
162 not included.
163
164 The optional argument *strict_parsing* is a flag indicating what to do with
165 parsing errors. If false (the default), errors are silently ignored. If true,
166 errors raise a :exc:`ValueError` exception.
167
Victor Stinnerac71c542011-01-14 12:52:12 +0000168 The optional *encoding* and *errors* parameters specify how to decode
169 percent-encoded sequences into Unicode characters, as accepted by the
170 :meth:`bytes.decode` method.
171
Facundo Batistac469d4c2008-09-03 22:49:01 +0000172 Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
173 query strings.
174
Victor Stinnerc58be2d2011-01-14 13:31:45 +0000175 .. versionchanged:: 3.2
176 Add *encoding* and *errors* parameters.
177
Facundo Batistac469d4c2008-09-03 22:49:01 +0000178
Georg Brandl116aa622007-08-15 14:28:22 +0000179.. function:: urlunparse(parts)
180
Georg Brandl0f7ede42008-06-23 11:23:31 +0000181 Construct a URL from a tuple as returned by ``urlparse()``. The *parts*
182 argument can be any six-item iterable. This may result in a slightly
183 different, but equivalent URL, if the URL that was parsed originally had
184 unnecessary delimiters (for example, a ``?`` with an empty query; the RFC
185 states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000186
187
R. David Murrayf5077aa2010-05-25 15:36:46 +0000188.. function:: urlsplit(urlstring, scheme='', allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000189
190 This is similar to :func:`urlparse`, but does not split the params from the URL.
191 This should generally be used instead of :func:`urlparse` if the more recent URL
192 syntax allowing parameters to be applied to each segment of the *path* portion
193 of the URL (see :rfc:`2396`) is wanted. A separate function is needed to
194 separate the path segments and parameters. This function returns a 5-tuple:
195 (addressing scheme, network location, path, query, fragment identifier).
196
197 The return value is actually an instance of a subclass of :class:`tuple`. This
198 class has the following additional read-only convenience attributes:
199
200 +------------------+-------+-------------------------+----------------------+
201 | Attribute | Index | Value | Value if not present |
202 +==================+=======+=========================+======================+
203 | :attr:`scheme` | 0 | URL scheme specifier | empty string |
204 +------------------+-------+-------------------------+----------------------+
205 | :attr:`netloc` | 1 | Network location part | empty string |
206 +------------------+-------+-------------------------+----------------------+
207 | :attr:`path` | 2 | Hierarchical path | empty string |
208 +------------------+-------+-------------------------+----------------------+
209 | :attr:`query` | 3 | Query component | empty string |
210 +------------------+-------+-------------------------+----------------------+
211 | :attr:`fragment` | 4 | Fragment identifier | empty string |
212 +------------------+-------+-------------------------+----------------------+
213 | :attr:`username` | | User name | :const:`None` |
214 +------------------+-------+-------------------------+----------------------+
215 | :attr:`password` | | Password | :const:`None` |
216 +------------------+-------+-------------------------+----------------------+
217 | :attr:`hostname` | | Host name (lower case) | :const:`None` |
218 +------------------+-------+-------------------------+----------------------+
219 | :attr:`port` | | Port number as integer, | :const:`None` |
220 | | | if present | |
221 +------------------+-------+-------------------------+----------------------+
222
223 See section :ref:`urlparse-result-object` for more information on the result
224 object.
225
Georg Brandl116aa622007-08-15 14:28:22 +0000226
227.. function:: urlunsplit(parts)
228
Georg Brandl0f7ede42008-06-23 11:23:31 +0000229 Combine the elements of a tuple as returned by :func:`urlsplit` into a
230 complete URL as a string. The *parts* argument can be any five-item
231 iterable. This may result in a slightly different, but equivalent URL, if the
232 URL that was parsed originally had unnecessary delimiters (for example, a ?
233 with an empty query; the RFC states that these are equivalent).
Georg Brandl116aa622007-08-15 14:28:22 +0000234
Georg Brandl116aa622007-08-15 14:28:22 +0000235
Georg Brandl7f01a132009-09-16 15:58:14 +0000236.. function:: urljoin(base, url, allow_fragments=True)
Georg Brandl116aa622007-08-15 14:28:22 +0000237
238 Construct a full ("absolute") URL by combining a "base URL" (*base*) with
239 another URL (*url*). Informally, this uses components of the base URL, in
Georg Brandl0f7ede42008-06-23 11:23:31 +0000240 particular the addressing scheme, the network location and (part of) the
241 path, to provide missing components in the relative URL. For example:
Georg Brandl116aa622007-08-15 14:28:22 +0000242
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000243 >>> from urllib.parse import urljoin
Georg Brandl116aa622007-08-15 14:28:22 +0000244 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html')
245 'http://www.cwi.nl/%7Eguido/FAQ.html'
246
247 The *allow_fragments* argument has the same meaning and default as for
248 :func:`urlparse`.
249
250 .. note::
251
252 If *url* is an absolute URL (that is, starting with ``//`` or ``scheme://``),
253 the *url*'s host name and/or scheme will be present in the result. For example:
254
Christian Heimesfe337bf2008-03-23 21:54:12 +0000255 .. doctest::
Georg Brandl116aa622007-08-15 14:28:22 +0000256
257 >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html',
258 ... '//www.python.org/%7Eguido')
259 'http://www.python.org/%7Eguido'
260
261 If you do not want that behavior, preprocess the *url* with :func:`urlsplit` and
262 :func:`urlunsplit`, removing possible *scheme* and *netloc* parts.
263
264
265.. function:: urldefrag(url)
266
Georg Brandl0f7ede42008-06-23 11:23:31 +0000267 If *url* contains a fragment identifier, return a modified version of *url*
268 with no fragment identifier, and the fragment identifier as a separate
269 string. If there is no fragment identifier in *url*, return *url* unmodified
270 and an empty string.
Georg Brandl116aa622007-08-15 14:28:22 +0000271
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000272 The return value is actually an instance of a subclass of :class:`tuple`. This
273 class has the following additional read-only convenience attributes:
274
275 +------------------+-------+-------------------------+----------------------+
276 | Attribute | Index | Value | Value if not present |
277 +==================+=======+=========================+======================+
278 | :attr:`url` | 0 | URL with no fragment | empty string |
279 +------------------+-------+-------------------------+----------------------+
280 | :attr:`fragment` | 1 | Fragment identifier | empty string |
281 +------------------+-------+-------------------------+----------------------+
282
283 See section :ref:`urlparse-result-object` for more information on the result
284 object.
285
286 .. versionchanged:: 3.2
Raymond Hettinger9a236b02011-01-24 09:01:27 +0000287 Result is a structured object rather than a simple 2-tuple.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000288
Georg Brandl009a6bd2011-01-24 19:59:08 +0000289.. _parsing-ascii-encoded-bytes:
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000290
291Parsing ASCII Encoded Bytes
292---------------------------
293
294The URL parsing functions were originally designed to operate on character
295strings only. In practice, it is useful to be able to manipulate properly
296quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the
297URL parsing functions in this module all operate on :class:`bytes` and
298:class:`bytearray` objects in addition to :class:`str` objects.
299
300If :class:`str` data is passed in, the result will also contain only
301:class:`str` data. If :class:`bytes` or :class:`bytearray` data is
302passed in, the result will contain only :class:`bytes` data.
303
304Attempting to mix :class:`str` data with :class:`bytes` or
305:class:`bytearray` in a single function call will result in a
Éric Araujoff2a4ba2010-11-30 17:20:31 +0000306:exc:`TypeError` being raised, while attempting to pass in non-ASCII
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000307byte values will trigger :exc:`UnicodeDecodeError`.
308
309To support easier conversion of result objects between :class:`str` and
310:class:`bytes`, all return values from URL parsing functions provide
311either an :meth:`encode` method (when the result contains :class:`str`
312data) or a :meth:`decode` method (when the result contains :class:`bytes`
313data). The signatures of these methods match those of the corresponding
314:class:`str` and :class:`bytes` methods (except that the default encoding
315is ``'ascii'`` rather than ``'utf-8'``). Each produces a value of a
316corresponding type that contains either :class:`bytes` data (for
317:meth:`encode` methods) or :class:`str` data (for
318:meth:`decode` methods).
319
320Applications that need to operate on potentially improperly quoted URLs
321that may contain non-ASCII data will need to do their own decoding from
322bytes to characters before invoking the URL parsing methods.
323
324The behaviour described in this section applies only to the URL parsing
325functions. The URL quoting functions use their own rules when producing
326or consuming byte sequences as detailed in the documentation of the
327individual URL quoting functions.
328
329.. versionchanged:: 3.2
330 URL parsing functions now accept ASCII encoded byte sequences
331
332
333.. _urlparse-result-object:
334
335Structured Parse Results
336------------------------
337
338The result objects from the :func:`urlparse`, :func:`urlsplit` and
Georg Brandl46402372010-12-04 19:06:18 +0000339:func:`urldefrag` functions are subclasses of the :class:`tuple` type.
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000340These subclasses add the attributes listed in the documentation for
341those functions, the encoding and decoding support described in the
342previous section, as well as an additional method:
343
344.. method:: urllib.parse.SplitResult.geturl()
345
346 Return the re-combined version of the original URL as a string. This may
347 differ from the original URL in that the scheme may be normalized to lower
348 case and empty components may be dropped. Specifically, empty parameters,
349 queries, and fragment identifiers will be removed.
350
351 For :func:`urldefrag` results, only empty fragment identifiers will be removed.
352 For :func:`urlsplit` and :func:`urlparse` results, all noted changes will be
353 made to the URL returned by this method.
354
355 The result of this method remains unchanged if passed back through the original
356 parsing function:
357
358 >>> from urllib.parse import urlsplit
359 >>> url = 'HTTP://www.Python.org/doc/#'
360 >>> r1 = urlsplit(url)
361 >>> r1.geturl()
362 'http://www.Python.org/doc/'
363 >>> r2 = urlsplit(r1.geturl())
364 >>> r2.geturl()
365 'http://www.Python.org/doc/'
366
367
368The following classes provide the implementations of the structured parse
369results when operating on :class:`str` objects:
370
371.. class:: DefragResult(url, fragment)
372
373 Concrete class for :func:`urldefrag` results containing :class:`str`
374 data. The :meth:`encode` method returns a :class:`DefragResultBytes`
375 instance.
376
377 .. versionadded:: 3.2
378
379.. class:: ParseResult(scheme, netloc, path, params, query, fragment)
380
381 Concrete class for :func:`urlparse` results containing :class:`str`
382 data. The :meth:`encode` method returns a :class:`ParseResultBytes`
383 instance.
384
385.. class:: SplitResult(scheme, netloc, path, query, fragment)
386
387 Concrete class for :func:`urlsplit` results containing :class:`str`
388 data. The :meth:`encode` method returns a :class:`SplitResultBytes`
389 instance.
390
391
392The following classes provide the implementations of the parse results when
393operating on :class:`bytes` or :class:`bytearray` objects:
394
395.. class:: DefragResultBytes(url, fragment)
396
397 Concrete class for :func:`urldefrag` results containing :class:`bytes`
398 data. The :meth:`decode` method returns a :class:`DefragResult`
399 instance.
400
401 .. versionadded:: 3.2
402
403.. class:: ParseResultBytes(scheme, netloc, path, params, query, fragment)
404
405 Concrete class for :func:`urlparse` results containing :class:`bytes`
406 data. The :meth:`decode` method returns a :class:`ParseResult`
407 instance.
408
409 .. versionadded:: 3.2
410
411.. class:: SplitResultBytes(scheme, netloc, path, query, fragment)
412
413 Concrete class for :func:`urlsplit` results containing :class:`bytes`
414 data. The :meth:`decode` method returns a :class:`SplitResult`
415 instance.
416
417 .. versionadded:: 3.2
418
419
420URL Quoting
421-----------
422
423The URL quoting functions focus on taking program data and making it safe
424for use as URL components by quoting special characters and appropriately
425encoding non-ASCII text. They also support reversing these operations to
426recreate the original data from the contents of a URL component if that
427task isn't already covered by the URL parsing functions above.
Georg Brandl7f01a132009-09-16 15:58:14 +0000428
429.. function:: quote(string, safe='/', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000430
431 Replace special characters in *string* using the ``%xx`` escape. Letters,
Senthil Kumaran8aa8bbe2009-08-31 16:43:45 +0000432 digits, and the characters ``'_.-'`` are never quoted. By default, this
433 function is intended for quoting the path section of URL. The optional *safe*
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000434 parameter specifies additional ASCII characters that should not be quoted
435 --- its default value is ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000436
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000437 *string* may be either a :class:`str` or a :class:`bytes`.
438
439 The optional *encoding* and *errors* parameters specify how to deal with
440 non-ASCII characters, as accepted by the :meth:`str.encode` method.
441 *encoding* defaults to ``'utf-8'``.
442 *errors* defaults to ``'strict'``, meaning unsupported characters raise a
443 :class:`UnicodeEncodeError`.
444 *encoding* and *errors* must not be supplied if *string* is a
445 :class:`bytes`, or a :class:`TypeError` is raised.
446
447 Note that ``quote(string, safe, encoding, errors)`` is equivalent to
448 ``quote_from_bytes(string.encode(encoding, errors), safe)``.
449
450 Example: ``quote('/El Niño/')`` yields ``'/El%20Ni%C3%B1o/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000451
452
Georg Brandl7f01a132009-09-16 15:58:14 +0000453.. function:: quote_plus(string, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000454
Georg Brandl0f7ede42008-06-23 11:23:31 +0000455 Like :func:`quote`, but also replace spaces by plus signs, as required for
Georg Brandl81c09db2009-07-29 07:27:08 +0000456 quoting HTML form values when building up a query string to go into a URL.
457 Plus signs in the original string are escaped unless they are included in
458 *safe*. It also does not have *safe* default to ``'/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000459
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000460 Example: ``quote_plus('/El Niño/')`` yields ``'%2FEl+Ni%C3%B1o%2F'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000461
Georg Brandl7f01a132009-09-16 15:58:14 +0000462
463.. function:: quote_from_bytes(bytes, safe='/')
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000464
465 Like :func:`quote`, but accepts a :class:`bytes` object rather than a
466 :class:`str`, and does not perform string-to-bytes encoding.
467
468 Example: ``quote_from_bytes(b'a&\xef')`` yields
469 ``'a%26%EF'``.
470
Georg Brandl7f01a132009-09-16 15:58:14 +0000471
472.. function:: unquote(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000473
474 Replace ``%xx`` escapes by their single-character equivalent.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000475 The optional *encoding* and *errors* parameters specify how to decode
476 percent-encoded sequences into Unicode characters, as accepted by the
477 :meth:`bytes.decode` method.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000478
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000479 *string* must be a :class:`str`.
480
481 *encoding* defaults to ``'utf-8'``.
482 *errors* defaults to ``'replace'``, meaning invalid sequences are replaced
483 by a placeholder character.
484
485 Example: ``unquote('/El%20Ni%C3%B1o/')`` yields ``'/El Niño/'``.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000486
487
Georg Brandl7f01a132009-09-16 15:58:14 +0000488.. function:: unquote_plus(string, encoding='utf-8', errors='replace')
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000489
Georg Brandl0f7ede42008-06-23 11:23:31 +0000490 Like :func:`unquote`, but also replace plus signs by spaces, as required for
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000491 unquoting HTML form values.
492
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000493 *string* must be a :class:`str`.
494
495 Example: ``unquote_plus('/El+Ni%C3%B1o/')`` yields ``'/El Niño/'``.
496
Georg Brandl7f01a132009-09-16 15:58:14 +0000497
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000498.. function:: unquote_to_bytes(string)
499
500 Replace ``%xx`` escapes by their single-octet equivalent, and return a
501 :class:`bytes` object.
502
503 *string* may be either a :class:`str` or a :class:`bytes`.
504
505 If it is a :class:`str`, unescaped non-ASCII characters in *string*
506 are encoded into UTF-8 bytes.
507
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000508 Example: ``unquote_to_bytes('a%26%EF')`` yields ``b'a&\xef'``.
Guido van Rossum52dbbb92008-08-18 21:44:30 +0000509
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000510
Senthil Kumarandf022da2010-07-03 17:48:22 +0000511.. function:: urlencode(query, doseq=False, safe='', encoding=None, errors=None)
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000512
Senthil Kumarandf022da2010-07-03 17:48:22 +0000513 Convert a mapping object or a sequence of two-element tuples, which may
Senthil Kumaran29333122011-02-11 11:25:47 +0000514 either be a :class:`str` or a :class:`bytes`, to a "percent-encoded"
Senthil Kumaran6b3434a2012-03-15 18:11:16 -0700515 string. If the resultant string is to be used as a *data* for POST
516 operation with :func:`urlopen` function, then it should be properly encoded
517 to bytes, otherwise it would result in a :exc:`TypeError`.
518
Senthil Kumarandf022da2010-07-03 17:48:22 +0000519 The resulting string is a series of ``key=value`` pairs separated by ``'&'``
520 characters, where both *key* and *value* are quoted using :func:`quote_plus`
521 above. When a sequence of two-element tuples is used as the *query*
522 argument, the first element of each tuple is a key and the second is a
523 value. The value element in itself can be a sequence and in that case, if
524 the optional parameter *doseq* is evaluates to *True*, individual
525 ``key=value`` pairs separated by ``'&'`` are generated for each element of
526 the value sequence for the key. The order of parameters in the encoded
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000527 string will match the order of parameter tuples in the sequence.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000528
529 When *query* parameter is a :class:`str`, the *safe*, *encoding* and *error*
Nick Coghlan9fc443c2010-11-30 15:48:08 +0000530 parameters are passed down to :func:`quote_plus` for encoding.
531
532 To reverse this encoding process, :func:`parse_qs` and :func:`parse_qsl` are
533 provided in this module to parse query strings into Python data structures.
Senthil Kumarandf022da2010-07-03 17:48:22 +0000534
Senthil Kumaran29333122011-02-11 11:25:47 +0000535 Refer to :ref:`urllib examples <urllib-examples>` to find out how urlencode
536 method can be used for generating query string for a URL or data for POST.
537
Senthil Kumarandf022da2010-07-03 17:48:22 +0000538 .. versionchanged:: 3.2
Georg Brandl67b21b72010-08-17 15:07:14 +0000539 Query parameter supports bytes and string objects.
Senthil Kumaranaca8fd72008-06-23 04:41:59 +0000540
Georg Brandl116aa622007-08-15 14:28:22 +0000541
542.. seealso::
543
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000544 :rfc:`3986` - Uniform Resource Identifiers
Senthil Kumaranfe9230a2011-06-19 13:52:49 -0700545 This is the current standard (STD66). Any changes to urllib.parse module
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000546 should conform to this. Certain deviations could be observed, which are
Georg Brandl6faee4e2010-09-21 14:48:28 +0000547 mostly for backward compatibility purposes and for certain de-facto
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000548 parsing requirements as commonly observed in major browsers.
549
550 :rfc:`2732` - Format for Literal IPv6 Addresses in URL's.
551 This specifies the parsing requirements of IPv6 URLs.
552
553 :rfc:`2396` - Uniform Resource Identifiers (URI): Generic Syntax
554 Document describing the generic syntactic requirements for both Uniform Resource
555 Names (URNs) and Uniform Resource Locators (URLs).
556
557 :rfc:`2368` - The mailto URL scheme.
558 Parsing requirements for mailto url schemes.
Georg Brandl116aa622007-08-15 14:28:22 +0000559
560 :rfc:`1808` - Relative Uniform Resource Locators
561 This Request For Comments includes the rules for joining an absolute and a
562 relative URL, including a fair number of "Abnormal Examples" which govern the
563 treatment of border cases.
564
Senthil Kumaran6257bdd2010-04-22 05:53:18 +0000565 :rfc:`1738` - Uniform Resource Locators (URL)
566 This specifies the formal syntax and semantics of absolute URLs.