blob: 73f6fe600882ceb9c82a0cfc39da10bde5330c75 [file] [log] [blame]
Georg Brandlf6842722008-01-19 22:08:21 +00001.. highlightlang:: c
2
3.. _unicodeobjects:
4
5Unicode Objects and Codecs
6--------------------------
7
8.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
9
10Unicode Objects
11^^^^^^^^^^^^^^^
12
13
Victor Stinner5f8aae02010-05-14 15:53:20 +000014Unicode Type
15""""""""""""
16
Georg Brandlf6842722008-01-19 22:08:21 +000017These are the basic Unicode object types used for the Unicode implementation in
18Python:
19
Georg Brandlf6842722008-01-19 22:08:21 +000020
Sandro Tosi98ed08f2012-01-14 16:42:02 +010021.. c:type:: Py_UNICODE
Georg Brandlf6842722008-01-19 22:08:21 +000022
23 This type represents the storage type which is used by Python internally as
24 basis for holding Unicode ordinals. Python's default builds use a 16-bit type
Sandro Tosi98ed08f2012-01-14 16:42:02 +010025 for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
Georg Brandlf6842722008-01-19 22:08:21 +000026 possible to build a UCS4 version of Python (most recent Linux distributions come
27 with UCS4 builds of Python). These builds then use a 32-bit type for
Sandro Tosi98ed08f2012-01-14 16:42:02 +010028 :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
29 where :c:type:`wchar_t` is available and compatible with the chosen Python
30 Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
31 :c:type:`wchar_t` to enhance native platform compatibility. On all other
32 platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
33 short` (UCS2) or :c:type:`unsigned long` (UCS4).
Georg Brandlf6842722008-01-19 22:08:21 +000034
35Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
36this in mind when writing extensions or interfaces.
37
38
Sandro Tosi98ed08f2012-01-14 16:42:02 +010039.. c:type:: PyUnicodeObject
Georg Brandlf6842722008-01-19 22:08:21 +000040
Sandro Tosi98ed08f2012-01-14 16:42:02 +010041 This subtype of :c:type:`PyObject` represents a Python Unicode object.
Georg Brandlf6842722008-01-19 22:08:21 +000042
43
Sandro Tosi98ed08f2012-01-14 16:42:02 +010044.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandlf6842722008-01-19 22:08:21 +000045
Sandro Tosi98ed08f2012-01-14 16:42:02 +010046 This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandlf6842722008-01-19 22:08:21 +000047 is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
48
49The following APIs are really C macros and can be used to do fast checks and to
50access internal read-only data of Unicode objects:
51
52
Sandro Tosi98ed08f2012-01-14 16:42:02 +010053.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000054
55 Return true if the object *o* is a Unicode object or an instance of a Unicode
56 subtype.
57
58 .. versionchanged:: 2.2
59 Allowed subtypes to be accepted.
60
61
Sandro Tosi98ed08f2012-01-14 16:42:02 +010062.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000063
64 Return true if the object *o* is a Unicode object, but not an instance of a
65 subtype.
66
67 .. versionadded:: 2.2
68
69
Sandro Tosi98ed08f2012-01-14 16:42:02 +010070.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000071
Sandro Tosi98ed08f2012-01-14 16:42:02 +010072 Return the size of the object. *o* has to be a :c:type:`PyUnicodeObject` (not
Georg Brandlf6842722008-01-19 22:08:21 +000073 checked).
74
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000075 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +010076 This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000077 in your code for properly supporting 64-bit systems.
78
Georg Brandlf6842722008-01-19 22:08:21 +000079
Sandro Tosi98ed08f2012-01-14 16:42:02 +010080.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000081
82 Return the size of the object's internal buffer in bytes. *o* has to be a
Sandro Tosi98ed08f2012-01-14 16:42:02 +010083 :c:type:`PyUnicodeObject` (not checked).
Georg Brandlf6842722008-01-19 22:08:21 +000084
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000085 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +010086 This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000087 in your code for properly supporting 64-bit systems.
88
Georg Brandlf6842722008-01-19 22:08:21 +000089
Sandro Tosi98ed08f2012-01-14 16:42:02 +010090.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000091
Sandro Tosi98ed08f2012-01-14 16:42:02 +010092 Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object. *o*
93 has to be a :c:type:`PyUnicodeObject` (not checked).
Georg Brandlf6842722008-01-19 22:08:21 +000094
95
Sandro Tosi98ed08f2012-01-14 16:42:02 +010096.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000097
98 Return a pointer to the internal buffer of the object. *o* has to be a
Sandro Tosi98ed08f2012-01-14 16:42:02 +010099 :c:type:`PyUnicodeObject` (not checked).
Georg Brandlf6842722008-01-19 22:08:21 +0000100
Christian Heimes3b718a72008-02-14 12:47:33 +0000101
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100102.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes3b718a72008-02-14 12:47:33 +0000103
104 Clear the free list. Return the total number of freed items.
105
106 .. versionadded:: 2.6
107
Georg Brandl36b30b52009-07-24 16:46:38 +0000108
Victor Stinner5f8aae02010-05-14 15:53:20 +0000109Unicode Character Properties
110""""""""""""""""""""""""""""
111
Georg Brandlf6842722008-01-19 22:08:21 +0000112Unicode provides many different character properties. The most often needed ones
113are available through these macros which are mapped to C functions depending on
114the Python configuration.
115
Georg Brandlf6842722008-01-19 22:08:21 +0000116
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100117.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000118
119 Return 1 or 0 depending on whether *ch* is a whitespace character.
120
121
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100122.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000123
124 Return 1 or 0 depending on whether *ch* is a lowercase character.
125
126
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100127.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000128
129 Return 1 or 0 depending on whether *ch* is an uppercase character.
130
131
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100132.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000133
134 Return 1 or 0 depending on whether *ch* is a titlecase character.
135
136
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100137.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000138
139 Return 1 or 0 depending on whether *ch* is a linebreak character.
140
141
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100142.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000143
144 Return 1 or 0 depending on whether *ch* is a decimal character.
145
146
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100147.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000148
149 Return 1 or 0 depending on whether *ch* is a digit character.
150
151
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100152.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000153
154 Return 1 or 0 depending on whether *ch* is a numeric character.
155
156
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100157.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000158
159 Return 1 or 0 depending on whether *ch* is an alphabetic character.
160
161
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100162.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000163
164 Return 1 or 0 depending on whether *ch* is an alphanumeric character.
165
166These APIs can be used for fast direct character conversions:
167
168
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100169.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000170
171 Return the character *ch* converted to lower case.
172
173
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100174.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000175
176 Return the character *ch* converted to upper case.
177
178
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100179.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000180
181 Return the character *ch* converted to title case.
182
183
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100184.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000185
186 Return the character *ch* converted to a decimal positive integer. Return
187 ``-1`` if this is not possible. This macro does not raise exceptions.
188
189
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100190.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000191
192 Return the character *ch* converted to a single digit integer. Return ``-1`` if
193 this is not possible. This macro does not raise exceptions.
194
195
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100196.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000197
198 Return the character *ch* converted to a double. Return ``-1.0`` if this is not
199 possible. This macro does not raise exceptions.
200
Victor Stinner5f8aae02010-05-14 15:53:20 +0000201
202Plain Py_UNICODE
203""""""""""""""""
204
Georg Brandlf6842722008-01-19 22:08:21 +0000205To create Unicode objects and access their basic sequence properties, use these
206APIs:
207
Georg Brandlf6842722008-01-19 22:08:21 +0000208
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100209.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000210
Georg Brandlb8d0e362010-11-26 07:53:50 +0000211 Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
Georg Brandlf6842722008-01-19 22:08:21 +0000212 may be *NULL* which causes the contents to be undefined. It is the user's
213 responsibility to fill in the needed data. The buffer is copied into the new
214 object. If the buffer is not *NULL*, the return value might be a shared object.
215 Therefore, modification of the resulting Unicode object is only allowed when *u*
216 is *NULL*.
217
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000218 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100219 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000220 changes in your code for properly supporting 64-bit systems.
221
Georg Brandlf6842722008-01-19 22:08:21 +0000222
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100223.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl79cdff02010-10-17 10:54:57 +0000224
Georg Brandlb8d0e362010-11-26 07:53:50 +0000225 Create a Unicode object from the char buffer *u*. The bytes will be interpreted
Georg Brandl79cdff02010-10-17 10:54:57 +0000226 as being UTF-8 encoded. *u* may also be *NULL* which
227 causes the contents to be undefined. It is the user's responsibility to fill in
228 the needed data. The buffer is copied into the new object. If the buffer is not
229 *NULL*, the return value might be a shared object. Therefore, modification of
230 the resulting Unicode object is only allowed when *u* is *NULL*.
231
232 .. versionadded:: 2.6
233
234
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100235.. c:function:: PyObject *PyUnicode_FromString(const char *u)
Georg Brandl79cdff02010-10-17 10:54:57 +0000236
237 Create a Unicode object from an UTF-8 encoded null-terminated char buffer
238 *u*.
239
240 .. versionadded:: 2.6
241
242
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100243.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl79cdff02010-10-17 10:54:57 +0000244
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100245 Take a C :c:func:`printf`\ -style *format* string and a variable number of
Georg Brandl79cdff02010-10-17 10:54:57 +0000246 arguments, calculate the size of the resulting Python unicode string and return
247 a string with the values formatted into it. The variable arguments must be C
248 types and must correspond exactly to the format characters in the *format*
249 string. The following format characters are allowed:
250
251 .. % The descriptions for %zd and %zu are wrong, but the truth is complicated
252 .. % because not all compilers support the %z width modifier -- we fake it
253 .. % when necessary via interpolating PY_FORMAT_SIZE_T.
254
255 +-------------------+---------------------+--------------------------------+
256 | Format Characters | Type | Comment |
257 +===================+=====================+================================+
258 | :attr:`%%` | *n/a* | The literal % character. |
259 +-------------------+---------------------+--------------------------------+
260 | :attr:`%c` | int | A single character, |
261 | | | represented as an C int. |
262 +-------------------+---------------------+--------------------------------+
263 | :attr:`%d` | int | Exactly equivalent to |
264 | | | ``printf("%d")``. |
265 +-------------------+---------------------+--------------------------------+
266 | :attr:`%u` | unsigned int | Exactly equivalent to |
267 | | | ``printf("%u")``. |
268 +-------------------+---------------------+--------------------------------+
269 | :attr:`%ld` | long | Exactly equivalent to |
270 | | | ``printf("%ld")``. |
271 +-------------------+---------------------+--------------------------------+
272 | :attr:`%lu` | unsigned long | Exactly equivalent to |
273 | | | ``printf("%lu")``. |
274 +-------------------+---------------------+--------------------------------+
275 | :attr:`%zd` | Py_ssize_t | Exactly equivalent to |
276 | | | ``printf("%zd")``. |
277 +-------------------+---------------------+--------------------------------+
278 | :attr:`%zu` | size_t | Exactly equivalent to |
279 | | | ``printf("%zu")``. |
280 +-------------------+---------------------+--------------------------------+
281 | :attr:`%i` | int | Exactly equivalent to |
282 | | | ``printf("%i")``. |
283 +-------------------+---------------------+--------------------------------+
284 | :attr:`%x` | int | Exactly equivalent to |
285 | | | ``printf("%x")``. |
286 +-------------------+---------------------+--------------------------------+
287 | :attr:`%s` | char\* | A null-terminated C character |
288 | | | array. |
289 +-------------------+---------------------+--------------------------------+
290 | :attr:`%p` | void\* | The hex representation of a C |
291 | | | pointer. Mostly equivalent to |
292 | | | ``printf("%p")`` except that |
293 | | | it is guaranteed to start with |
294 | | | the literal ``0x`` regardless |
295 | | | of what the platform's |
296 | | | ``printf`` yields. |
297 +-------------------+---------------------+--------------------------------+
298 | :attr:`%U` | PyObject\* | A unicode object. |
299 +-------------------+---------------------+--------------------------------+
300 | :attr:`%V` | PyObject\*, char \* | A unicode object (which may be |
301 | | | *NULL*) and a null-terminated |
302 | | | C character array as a second |
303 | | | parameter (which will be used, |
304 | | | if the first parameter is |
305 | | | *NULL*). |
306 +-------------------+---------------------+--------------------------------+
307 | :attr:`%S` | PyObject\* | The result of calling |
308 | | | :func:`PyObject_Unicode`. |
309 +-------------------+---------------------+--------------------------------+
310 | :attr:`%R` | PyObject\* | The result of calling |
311 | | | :func:`PyObject_Repr`. |
312 +-------------------+---------------------+--------------------------------+
313
314 An unrecognized format character causes all the rest of the format string to be
315 copied as-is to the result string, and any extra arguments discarded.
316
317 .. versionadded:: 2.6
318
319
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100320.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl79cdff02010-10-17 10:54:57 +0000321
322 Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
323 arguments.
324
325 .. versionadded:: 2.6
326
327
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100328.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000329
Victor Stinner28a545e2011-12-18 19:39:53 +0100330 Return a read-only pointer to the Unicode object's internal
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100331 :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
332 Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded
Victor Stinner28a545e2011-12-18 19:39:53 +0100333 null characters, which would cause the string to be truncated when used in
334 most C functions.
Georg Brandlf6842722008-01-19 22:08:21 +0000335
336
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100337.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000338
339 Return the length of the Unicode object.
340
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000341 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100342 This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000343 in your code for properly supporting 64-bit systems.
344
Georg Brandlf6842722008-01-19 22:08:21 +0000345
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100346.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000347
348 Coerce an encoded object *obj* to an Unicode object and return a reference with
349 incremented refcount.
350
351 String and other char buffer compatible objects are decoded according to the
352 given encoding and using the error handling defined by errors. Both can be
353 *NULL* to have the interface use the default values (see the next section for
354 details).
355
356 All other objects, including Unicode objects, cause a :exc:`TypeError` to be
357 set.
358
359 The API returns *NULL* if there was an error. The caller is responsible for
360 decref'ing the returned objects.
361
362
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100363.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandlf6842722008-01-19 22:08:21 +0000364
365 Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
366 throughout the interpreter whenever coercion to Unicode is needed.
367
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100368If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
Georg Brandlf6842722008-01-19 22:08:21 +0000369Python can interface directly to this type using the following functions.
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100370Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
371the system's :c:type:`wchar_t`.
Georg Brandlf6842722008-01-19 22:08:21 +0000372
Georg Brandlf6842722008-01-19 22:08:21 +0000373
Victor Stinner5f8aae02010-05-14 15:53:20 +0000374wchar_t Support
375"""""""""""""""
376
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100377:c:type:`wchar_t` support for platforms which support it:
Georg Brandlf6842722008-01-19 22:08:21 +0000378
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100379.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000380
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100381 Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
Georg Brandlf6842722008-01-19 22:08:21 +0000382 Return *NULL* on failure.
383
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000384 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100385 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000386 changes in your code for properly supporting 64-bit systems.
387
Georg Brandlf6842722008-01-19 22:08:21 +0000388
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100389.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *unicode, wchar_t *w, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000390
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100391 Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
392 *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
393 0-termination character). Return the number of :c:type:`wchar_t` characters
394 copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t`
Georg Brandlf6842722008-01-19 22:08:21 +0000395 string may or may not be 0-terminated. It is the responsibility of the caller
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100396 to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is
397 required by the application. Also, note that the :c:type:`wchar_t*` string
Victor Stinner28a545e2011-12-18 19:39:53 +0100398 might contain null characters, which would cause the string to be truncated
399 when used with most C functions.
Georg Brandlf6842722008-01-19 22:08:21 +0000400
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000401 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100402 This function returned an :c:type:`int` type and used an :c:type:`int`
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000403 type for *size*. This might require changes in your code for properly
404 supporting 64-bit systems.
405
Georg Brandlf6842722008-01-19 22:08:21 +0000406
407.. _builtincodecs:
408
409Built-in Codecs
410^^^^^^^^^^^^^^^
411
Georg Brandld7d4fd72009-07-26 14:37:28 +0000412Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandlf6842722008-01-19 22:08:21 +0000413these codecs are directly usable via the following functions.
414
Ezio Melotti020f6502011-04-14 07:39:06 +0300415Many of the following APIs take two arguments encoding and errors, and they
416have the same semantics as the ones of the built-in :func:`unicode` Unicode
417object constructor.
Georg Brandlf6842722008-01-19 22:08:21 +0000418
419Setting encoding to *NULL* causes the default encoding to be used which is
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100420ASCII. The file system calls should use :c:data:`Py_FileSystemDefaultEncoding`
Ezio Melotti020f6502011-04-14 07:39:06 +0300421as the encoding for file names. This variable should be treated as read-only: on
Georg Brandlf6842722008-01-19 22:08:21 +0000422some systems, it will be a pointer to a static string, on others, it will change
423at run-time (such as when the application invokes setlocale).
424
425Error handling is set by errors which may also be set to *NULL* meaning to use
426the default handling defined for the codec. Default error handling for all
Georg Brandld7d4fd72009-07-26 14:37:28 +0000427built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandlf6842722008-01-19 22:08:21 +0000428
429The codecs all use a similar interface. Only deviation from the following
430generic ones are documented for simplicity.
431
Georg Brandlf6842722008-01-19 22:08:21 +0000432
Victor Stinner5f8aae02010-05-14 15:53:20 +0000433Generic Codecs
434""""""""""""""
435
436These are the generic codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000437
438
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100439.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000440
441 Create a Unicode object by decoding *size* bytes of the encoded string *s*.
442 *encoding* and *errors* have the same meaning as the parameters of the same name
Georg Brandld7d4fd72009-07-26 14:37:28 +0000443 in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandlf6842722008-01-19 22:08:21 +0000444 using the Python codec registry. Return *NULL* if an exception was raised by
445 the codec.
446
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000447 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100448 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000449 changes in your code for properly supporting 64-bit systems.
450
Georg Brandlf6842722008-01-19 22:08:21 +0000451
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100452.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000453
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100454 Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
Georg Brandlf6842722008-01-19 22:08:21 +0000455 string object. *encoding* and *errors* have the same meaning as the parameters
456 of the same name in the Unicode :meth:`encode` method. The codec to be used is
457 looked up using the Python codec registry. Return *NULL* if an exception was
458 raised by the codec.
459
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000460 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100461 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000462 changes in your code for properly supporting 64-bit systems.
463
Georg Brandlf6842722008-01-19 22:08:21 +0000464
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100465.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000466
467 Encode a Unicode object and return the result as Python string object.
468 *encoding* and *errors* have the same meaning as the parameters of the same name
469 in the Unicode :meth:`encode` method. The codec to be used is looked up using
470 the Python codec registry. Return *NULL* if an exception was raised by the
471 codec.
472
Georg Brandlf6842722008-01-19 22:08:21 +0000473
Victor Stinner5f8aae02010-05-14 15:53:20 +0000474UTF-8 Codecs
475""""""""""""
476
477These are the UTF-8 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000478
479
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100480.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000481
482 Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
483 *s*. Return *NULL* if an exception was raised by the codec.
484
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000485 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100486 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000487 changes in your code for properly supporting 64-bit systems.
488
Georg Brandlf6842722008-01-19 22:08:21 +0000489
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100490.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000491
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100492 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandlf6842722008-01-19 22:08:21 +0000493 *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
494 treated as an error. Those bytes will not be decoded and the number of bytes
495 that have been decoded will be stored in *consumed*.
496
497 .. versionadded:: 2.4
498
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000499 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100500 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000501 changes in your code for properly supporting 64-bit systems.
502
Georg Brandlf6842722008-01-19 22:08:21 +0000503
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100504.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000505
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100506 Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and return a
Georg Brandlf6842722008-01-19 22:08:21 +0000507 Python string object. Return *NULL* if an exception was raised by the codec.
508
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000509 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100510 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000511 changes in your code for properly supporting 64-bit systems.
512
Georg Brandlf6842722008-01-19 22:08:21 +0000513
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100514.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000515
516 Encode a Unicode object using UTF-8 and return the result as Python string
517 object. Error handling is "strict". Return *NULL* if an exception was raised
518 by the codec.
519
Georg Brandlf6842722008-01-19 22:08:21 +0000520
Victor Stinner5f8aae02010-05-14 15:53:20 +0000521UTF-32 Codecs
522"""""""""""""
523
524These are the UTF-32 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000525
526
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100527.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000528
Ezio Melotti020f6502011-04-14 07:39:06 +0300529 Decode *size* bytes from a UTF-32 encoded buffer string and return the
Georg Brandlf6842722008-01-19 22:08:21 +0000530 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
531 handling. It defaults to "strict".
532
533 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
534 order::
535
536 *byteorder == -1: little endian
537 *byteorder == 0: native order
538 *byteorder == 1: big endian
539
Georg Brandl579a3582009-09-18 21:35:59 +0000540 If ``*byteorder`` is zero, and the first four bytes of the input data are a
541 byte order mark (BOM), the decoder switches to this byte order and the BOM is
542 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
543 ``1``, any byte order mark is copied to the output.
544
545 After completion, *\*byteorder* is set to the current byte order at the end
546 of input data.
Georg Brandlf6842722008-01-19 22:08:21 +0000547
548 In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
549
550 If *byteorder* is *NULL*, the codec starts in native order mode.
551
552 Return *NULL* if an exception was raised by the codec.
553
554 .. versionadded:: 2.6
555
556
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100557.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000558
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100559 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
560 *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandlf6842722008-01-19 22:08:21 +0000561 trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
562 by four) as an error. Those bytes will not be decoded and the number of bytes
563 that have been decoded will be stored in *consumed*.
564
565 .. versionadded:: 2.6
566
567
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100568.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000569
570 Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl579a3582009-09-18 21:35:59 +0000571 data in *s*. Output is written according to the following byte order::
Georg Brandlf6842722008-01-19 22:08:21 +0000572
573 byteorder == -1: little endian
574 byteorder == 0: native byte order (writes a BOM mark)
575 byteorder == 1: big endian
576
577 If byteorder is ``0``, the output string will always start with the Unicode BOM
578 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
579
580 If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
581 as a single codepoint.
582
583 Return *NULL* if an exception was raised by the codec.
584
585 .. versionadded:: 2.6
586
587
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100588.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000589
590 Return a Python string using the UTF-32 encoding in native byte order. The
591 string always starts with a BOM mark. Error handling is "strict". Return
592 *NULL* if an exception was raised by the codec.
593
594 .. versionadded:: 2.6
595
596
Victor Stinner5f8aae02010-05-14 15:53:20 +0000597UTF-16 Codecs
598"""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000599
Victor Stinner5f8aae02010-05-14 15:53:20 +0000600These are the UTF-16 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000601
602
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100603.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000604
Ezio Melotti020f6502011-04-14 07:39:06 +0300605 Decode *size* bytes from a UTF-16 encoded buffer string and return the
Georg Brandlf6842722008-01-19 22:08:21 +0000606 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
607 handling. It defaults to "strict".
608
609 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
610 order::
611
612 *byteorder == -1: little endian
613 *byteorder == 0: native order
614 *byteorder == 1: big endian
615
Georg Brandl579a3582009-09-18 21:35:59 +0000616 If ``*byteorder`` is zero, and the first two bytes of the input data are a
617 byte order mark (BOM), the decoder switches to this byte order and the BOM is
618 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
619 ``1``, any byte order mark is copied to the output (where it will result in
620 either a ``\ufeff`` or a ``\ufffe`` character).
621
622 After completion, *\*byteorder* is set to the current byte order at the end
623 of input data.
Georg Brandlf6842722008-01-19 22:08:21 +0000624
625 If *byteorder* is *NULL*, the codec starts in native order mode.
626
627 Return *NULL* if an exception was raised by the codec.
628
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000629 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100630 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000631 changes in your code for properly supporting 64-bit systems.
632
Georg Brandlf6842722008-01-19 22:08:21 +0000633
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100634.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000635
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100636 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
637 *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandlf6842722008-01-19 22:08:21 +0000638 trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
639 split surrogate pair) as an error. Those bytes will not be decoded and the
640 number of bytes that have been decoded will be stored in *consumed*.
641
642 .. versionadded:: 2.4
643
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000644 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100645 This function used an :c:type:`int` type for *size* and an :c:type:`int *`
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000646 type for *consumed*. This might require changes in your code for
647 properly supporting 64-bit systems.
648
Georg Brandlf6842722008-01-19 22:08:21 +0000649
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100650.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000651
652 Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl579a3582009-09-18 21:35:59 +0000653 data in *s*. Output is written according to the following byte order::
Georg Brandlf6842722008-01-19 22:08:21 +0000654
655 byteorder == -1: little endian
656 byteorder == 0: native byte order (writes a BOM mark)
657 byteorder == 1: big endian
658
659 If byteorder is ``0``, the output string will always start with the Unicode BOM
660 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
661
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100662 If *Py_UNICODE_WIDE* is defined, a single :c:type:`Py_UNICODE` value may get
663 represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Georg Brandlf6842722008-01-19 22:08:21 +0000664 values is interpreted as an UCS-2 character.
665
666 Return *NULL* if an exception was raised by the codec.
667
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000668 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100669 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000670 changes in your code for properly supporting 64-bit systems.
671
Georg Brandlf6842722008-01-19 22:08:21 +0000672
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100673.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000674
675 Return a Python string using the UTF-16 encoding in native byte order. The
676 string always starts with a BOM mark. Error handling is "strict". Return
677 *NULL* if an exception was raised by the codec.
678
Georg Brandlf6842722008-01-19 22:08:21 +0000679
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000680UTF-7 Codecs
681""""""""""""
682
683These are the UTF-7 codec APIs:
684
685
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100686.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000687
688 Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
689 *s*. Return *NULL* if an exception was raised by the codec.
690
691
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100692.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000693
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100694 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000695 *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
696 be treated as an error. Those bytes will not be decoded and the number of
697 bytes that have been decoded will be stored in *consumed*.
698
699
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100700.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000701
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100702 Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000703 return a Python bytes object. Return *NULL* if an exception was raised by
704 the codec.
705
706 If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
707 special meaning) will be encoded in base-64. If *base64WhiteSpace* is
708 nonzero, whitespace will be encoded in base-64. Both are set to zero for the
709 Python "utf-7" codec.
710
711
Victor Stinner5f8aae02010-05-14 15:53:20 +0000712Unicode-Escape Codecs
713"""""""""""""""""""""
714
715These are the "Unicode Escape" codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000716
717
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100718.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000719
720 Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
721 string *s*. Return *NULL* if an exception was raised by the codec.
722
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000723 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100724 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000725 changes in your code for properly supporting 64-bit systems.
726
Georg Brandlf6842722008-01-19 22:08:21 +0000727
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100728.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000729
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100730 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
Georg Brandlf6842722008-01-19 22:08:21 +0000731 return a Python string object. Return *NULL* if an exception was raised by the
732 codec.
733
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000734 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100735 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000736 changes in your code for properly supporting 64-bit systems.
737
Georg Brandlf6842722008-01-19 22:08:21 +0000738
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100739.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000740
741 Encode a Unicode object using Unicode-Escape and return the result as Python
742 string object. Error handling is "strict". Return *NULL* if an exception was
743 raised by the codec.
744
Georg Brandlf6842722008-01-19 22:08:21 +0000745
Victor Stinner5f8aae02010-05-14 15:53:20 +0000746Raw-Unicode-Escape Codecs
747"""""""""""""""""""""""""
748
749These are the "Raw Unicode Escape" codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000750
751
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100752.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000753
754 Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
755 encoded string *s*. Return *NULL* if an exception was raised by the codec.
756
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000757 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100758 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000759 changes in your code for properly supporting 64-bit systems.
760
Georg Brandlf6842722008-01-19 22:08:21 +0000761
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100762.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000763
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100764 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
Georg Brandlf6842722008-01-19 22:08:21 +0000765 and return a Python string object. Return *NULL* if an exception was raised by
766 the codec.
767
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000768 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100769 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000770 changes in your code for properly supporting 64-bit systems.
771
Georg Brandlf6842722008-01-19 22:08:21 +0000772
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100773.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000774
775 Encode a Unicode object using Raw-Unicode-Escape and return the result as
776 Python string object. Error handling is "strict". Return *NULL* if an exception
777 was raised by the codec.
778
Victor Stinner5f8aae02010-05-14 15:53:20 +0000779
780Latin-1 Codecs
781""""""""""""""
782
Georg Brandlf6842722008-01-19 22:08:21 +0000783These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
784ordinals and only these are accepted by the codecs during encoding.
785
Georg Brandlf6842722008-01-19 22:08:21 +0000786
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100787.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000788
789 Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
790 *s*. Return *NULL* if an exception was raised by the codec.
791
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000792 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100793 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000794 changes in your code for properly supporting 64-bit systems.
795
Georg Brandlf6842722008-01-19 22:08:21 +0000796
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100797.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000798
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100799 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and return
Georg Brandlf6842722008-01-19 22:08:21 +0000800 a Python string object. Return *NULL* if an exception was raised by the codec.
801
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000802 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100803 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000804 changes in your code for properly supporting 64-bit systems.
805
Georg Brandlf6842722008-01-19 22:08:21 +0000806
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100807.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000808
809 Encode a Unicode object using Latin-1 and return the result as Python string
810 object. Error handling is "strict". Return *NULL* if an exception was raised
811 by the codec.
812
Victor Stinner5f8aae02010-05-14 15:53:20 +0000813
814ASCII Codecs
815""""""""""""
816
Georg Brandlf6842722008-01-19 22:08:21 +0000817These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
818codes generate errors.
819
Georg Brandlf6842722008-01-19 22:08:21 +0000820
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100821.. c:function:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000822
823 Create a Unicode object by decoding *size* bytes of the ASCII encoded string
824 *s*. Return *NULL* if an exception was raised by the codec.
825
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000826 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100827 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000828 changes in your code for properly supporting 64-bit systems.
829
Georg Brandlf6842722008-01-19 22:08:21 +0000830
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100831.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000832
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100833 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and return a
Georg Brandlf6842722008-01-19 22:08:21 +0000834 Python string object. Return *NULL* if an exception was raised by the codec.
835
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000836 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100837 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000838 changes in your code for properly supporting 64-bit systems.
839
Georg Brandlf6842722008-01-19 22:08:21 +0000840
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100841.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000842
843 Encode a Unicode object using ASCII and return the result as Python string
844 object. Error handling is "strict". Return *NULL* if an exception was raised
845 by the codec.
846
Georg Brandlf6842722008-01-19 22:08:21 +0000847
Victor Stinner5f8aae02010-05-14 15:53:20 +0000848Character Map Codecs
849""""""""""""""""""""
850
Georg Brandlf6842722008-01-19 22:08:21 +0000851This codec is special in that it can be used to implement many different codecs
852(and this is in fact what was done to obtain most of the standard codecs
853included in the :mod:`encodings` package). The codec uses mapping to encode and
854decode characters.
855
856Decoding mappings must map single string characters to single Unicode
857characters, integers (which are then interpreted as Unicode ordinals) or None
858(meaning "undefined mapping" and causing an error).
859
860Encoding mappings must map single Unicode characters to single string
861characters, integers (which are then interpreted as Latin-1 ordinals) or None
862(meaning "undefined mapping" and causing an error).
863
864The mapping objects provided must only support the __getitem__ mapping
865interface.
866
867If a character lookup fails with a LookupError, the character is copied as-is
868meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
869resp. Because of this, mappings only need to contain those mappings which map
870characters to different code points.
871
Ezio Melotti020f6502011-04-14 07:39:06 +0300872These are the mapping codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000873
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100874.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000875
876 Create a Unicode object by decoding *size* bytes of the encoded string *s* using
877 the given *mapping* object. Return *NULL* if an exception was raised by the
878 codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
879 dictionary mapping byte or a unicode string, which is treated as a lookup table.
880 Byte values greater that the length of the string and U+FFFE "characters" are
881 treated as "undefined mapping".
882
883 .. versionchanged:: 2.4
884 Allowed unicode string as mapping argument.
885
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000886 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100887 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000888 changes in your code for properly supporting 64-bit systems.
889
Georg Brandlf6842722008-01-19 22:08:21 +0000890
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100891.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000892
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100893 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
Georg Brandlf6842722008-01-19 22:08:21 +0000894 *mapping* object and return a Python string object. Return *NULL* if an
895 exception was raised by the codec.
896
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000897 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100898 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000899 changes in your code for properly supporting 64-bit systems.
900
Georg Brandlf6842722008-01-19 22:08:21 +0000901
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100902.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
Georg Brandlf6842722008-01-19 22:08:21 +0000903
904 Encode a Unicode object using the given *mapping* object and return the result
905 as Python string object. Error handling is "strict". Return *NULL* if an
906 exception was raised by the codec.
907
908The following codec API is special in that maps Unicode to Unicode.
909
910
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100911.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000912
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100913 Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
Georg Brandlf6842722008-01-19 22:08:21 +0000914 character mapping *table* to it and return the resulting Unicode object. Return
915 *NULL* when an exception was raised by the codec.
916
917 The *mapping* table must map Unicode ordinal integers to Unicode ordinal
918 integers or None (causing deletion of the character).
919
920 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
921 and sequences work well. Unmapped character ordinals (ones which cause a
922 :exc:`LookupError`) are left untouched and are copied as-is.
923
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000924 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100925 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000926 changes in your code for properly supporting 64-bit systems.
927
Ezio Melotti020f6502011-04-14 07:39:06 +0300928
929MBCS codecs for Windows
930"""""""""""""""""""""""
931
Georg Brandlf6842722008-01-19 22:08:21 +0000932These are the MBCS codec APIs. They are currently only available on Windows and
933use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
934DBCS) is a class of encodings, not just one. The target encoding is defined by
935the user settings on the machine running the codec.
936
Victor Stinner5f8aae02010-05-14 15:53:20 +0000937
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100938.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000939
940 Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
941 Return *NULL* if an exception was raised by the codec.
942
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000943 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100944 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000945 changes in your code for properly supporting 64-bit systems.
946
Georg Brandlf6842722008-01-19 22:08:21 +0000947
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100948.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000949
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100950 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
951 *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandlf6842722008-01-19 22:08:21 +0000952 trailing lead byte and the number of bytes that have been decoded will be stored
953 in *consumed*.
954
955 .. versionadded:: 2.5
956
957
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100958.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000959
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100960 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return a
Georg Brandlf6842722008-01-19 22:08:21 +0000961 Python string object. Return *NULL* if an exception was raised by the codec.
962
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000963 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100964 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000965 changes in your code for properly supporting 64-bit systems.
966
Georg Brandlf6842722008-01-19 22:08:21 +0000967
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100968.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000969
970 Encode a Unicode object using MBCS and return the result as Python string
971 object. Error handling is "strict". Return *NULL* if an exception was raised
972 by the codec.
973
Georg Brandlf6842722008-01-19 22:08:21 +0000974
Victor Stinner5f8aae02010-05-14 15:53:20 +0000975Methods & Slots
976"""""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000977
978.. _unicodemethodsandslots:
979
980Methods and Slot Functions
981^^^^^^^^^^^^^^^^^^^^^^^^^^
982
983The following APIs are capable of handling Unicode objects and strings on input
984(we refer to them as strings in the descriptions) and return Unicode objects or
985integers as appropriate.
986
987They all return *NULL* or ``-1`` if an exception occurs.
988
989
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100990.. c:function:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
Georg Brandlf6842722008-01-19 22:08:21 +0000991
992 Concat two strings giving a new Unicode string.
993
994
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100995.. c:function:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
Georg Brandlf6842722008-01-19 22:08:21 +0000996
Ezio Melotti020f6502011-04-14 07:39:06 +0300997 Split a string giving a list of Unicode strings. If *sep* is *NULL*, splitting
Georg Brandlf6842722008-01-19 22:08:21 +0000998 will be done at all whitespace substrings. Otherwise, splits occur at the given
999 separator. At most *maxsplit* splits will be done. If negative, no limit is
1000 set. Separators are not included in the resulting list.
1001
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001002 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001003 This function used an :c:type:`int` type for *maxsplit*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001004 changes in your code for properly supporting 64-bit systems.
1005
Georg Brandlf6842722008-01-19 22:08:21 +00001006
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001007.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandlf6842722008-01-19 22:08:21 +00001008
1009 Split a Unicode string at line breaks, returning a list of Unicode strings.
1010 CRLF is considered to be one line break. If *keepend* is 0, the Line break
1011 characters are not included in the resulting strings.
1012
1013
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001014.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +00001015
1016 Translate a string by applying a character mapping table to it and return the
1017 resulting Unicode object.
1018
1019 The mapping table must map Unicode ordinal integers to Unicode ordinal integers
1020 or None (causing deletion of the character).
1021
1022 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1023 and sequences work well. Unmapped character ordinals (ones which cause a
1024 :exc:`LookupError`) are left untouched and are copied as-is.
1025
1026 *errors* has the usual meaning for codecs. It may be *NULL* which indicates to
1027 use the default error handling.
1028
1029
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001030.. c:function:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
Georg Brandlf6842722008-01-19 22:08:21 +00001031
Ezio Melotti020f6502011-04-14 07:39:06 +03001032 Join a sequence of strings using the given *separator* and return the resulting
Georg Brandlf6842722008-01-19 22:08:21 +00001033 Unicode string.
1034
1035
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001036.. c:function:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandlf6842722008-01-19 22:08:21 +00001037
Ezio Melotti020f6502011-04-14 07:39:06 +03001038 Return 1 if *substr* matches ``str[start:end]`` at the given tail end
Georg Brandlf6842722008-01-19 22:08:21 +00001039 (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
1040 0 otherwise. Return ``-1`` if an error occurred.
1041
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001042 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001043 This function used an :c:type:`int` type for *start* and *end*. This
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001044 might require changes in your code for properly supporting 64-bit
1045 systems.
1046
Georg Brandlf6842722008-01-19 22:08:21 +00001047
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001048.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandlf6842722008-01-19 22:08:21 +00001049
Ezio Melotti020f6502011-04-14 07:39:06 +03001050 Return the first position of *substr* in ``str[start:end]`` using the given
Georg Brandlf6842722008-01-19 22:08:21 +00001051 *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
1052 backward search). The return value is the index of the first match; a value of
1053 ``-1`` indicates that no match was found, and ``-2`` indicates that an error
1054 occurred and an exception has been set.
1055
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001056 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001057 This function used an :c:type:`int` type for *start* and *end*. This
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001058 might require changes in your code for properly supporting 64-bit
1059 systems.
1060
Georg Brandlf6842722008-01-19 22:08:21 +00001061
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001062.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
Georg Brandlf6842722008-01-19 22:08:21 +00001063
1064 Return the number of non-overlapping occurrences of *substr* in
1065 ``str[start:end]``. Return ``-1`` if an error occurred.
1066
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001067 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001068 This function returned an :c:type:`int` type and used an :c:type:`int`
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001069 type for *start* and *end*. This might require changes in your code for
1070 properly supporting 64-bit systems.
1071
Georg Brandlf6842722008-01-19 22:08:21 +00001072
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001073.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
Georg Brandlf6842722008-01-19 22:08:21 +00001074
1075 Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
1076 return the resulting Unicode object. *maxcount* == -1 means replace all
1077 occurrences.
1078
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001079 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001080 This function used an :c:type:`int` type for *maxcount*. This might
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001081 require changes in your code for properly supporting 64-bit systems.
1082
Georg Brandlf6842722008-01-19 22:08:21 +00001083
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001084.. c:function:: int PyUnicode_Compare(PyObject *left, PyObject *right)
Georg Brandlf6842722008-01-19 22:08:21 +00001085
1086 Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
1087 respectively.
1088
1089
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001090.. c:function:: int PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
Georg Brandlf6842722008-01-19 22:08:21 +00001091
1092 Rich compare two unicode strings and return one of the following:
1093
1094 * ``NULL`` in case an exception was raised
1095 * :const:`Py_True` or :const:`Py_False` for successful comparisons
1096 * :const:`Py_NotImplemented` in case the type combination is unknown
1097
1098 Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
1099 :exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
1100 with a :exc:`UnicodeDecodeError`.
1101
1102 Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
1103 :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
1104
1105
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001106.. c:function:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
Georg Brandlf6842722008-01-19 22:08:21 +00001107
1108 Return a new string object from *format* and *args*; this is analogous to
1109 ``format % args``. The *args* argument must be a tuple.
1110
1111
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001112.. c:function:: int PyUnicode_Contains(PyObject *container, PyObject *element)
Georg Brandlf6842722008-01-19 22:08:21 +00001113
1114 Check whether *element* is contained in *container* and return true or false
1115 accordingly.
1116
1117 *element* has to coerce to a one element Unicode string. ``-1`` is returned if
1118 there was an error.