blob: 13a28075bb7061f8b2b190e35d23d7af27ed0120 [file] [log] [blame]
Georg Brandlf6842722008-01-19 22:08:21 +00001.. highlightlang:: c
2
3.. _unicodeobjects:
4
5Unicode Objects and Codecs
6--------------------------
7
8.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
9
10Unicode Objects
11^^^^^^^^^^^^^^^
12
13
Victor Stinner5f8aae02010-05-14 15:53:20 +000014Unicode Type
15""""""""""""
16
Georg Brandlf6842722008-01-19 22:08:21 +000017These are the basic Unicode object types used for the Unicode implementation in
18Python:
19
Georg Brandlf6842722008-01-19 22:08:21 +000020
Sandro Tosi98ed08f2012-01-14 16:42:02 +010021.. c:type:: Py_UNICODE
Georg Brandlf6842722008-01-19 22:08:21 +000022
23 This type represents the storage type which is used by Python internally as
24 basis for holding Unicode ordinals. Python's default builds use a 16-bit type
Sandro Tosi98ed08f2012-01-14 16:42:02 +010025 for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
Georg Brandlf6842722008-01-19 22:08:21 +000026 possible to build a UCS4 version of Python (most recent Linux distributions come
27 with UCS4 builds of Python). These builds then use a 32-bit type for
Sandro Tosi98ed08f2012-01-14 16:42:02 +010028 :c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
29 where :c:type:`wchar_t` is available and compatible with the chosen Python
30 Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
31 :c:type:`wchar_t` to enhance native platform compatibility. On all other
32 platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
33 short` (UCS2) or :c:type:`unsigned long` (UCS4).
Georg Brandlf6842722008-01-19 22:08:21 +000034
35Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
36this in mind when writing extensions or interfaces.
37
38
Sandro Tosi98ed08f2012-01-14 16:42:02 +010039.. c:type:: PyUnicodeObject
Georg Brandlf6842722008-01-19 22:08:21 +000040
Sandro Tosi98ed08f2012-01-14 16:42:02 +010041 This subtype of :c:type:`PyObject` represents a Python Unicode object.
Georg Brandlf6842722008-01-19 22:08:21 +000042
43
Sandro Tosi98ed08f2012-01-14 16:42:02 +010044.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandlf6842722008-01-19 22:08:21 +000045
Sandro Tosi98ed08f2012-01-14 16:42:02 +010046 This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandlf6842722008-01-19 22:08:21 +000047 is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
48
49The following APIs are really C macros and can be used to do fast checks and to
50access internal read-only data of Unicode objects:
51
52
Sandro Tosi98ed08f2012-01-14 16:42:02 +010053.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000054
55 Return true if the object *o* is a Unicode object or an instance of a Unicode
56 subtype.
57
58 .. versionchanged:: 2.2
59 Allowed subtypes to be accepted.
60
61
Sandro Tosi98ed08f2012-01-14 16:42:02 +010062.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000063
64 Return true if the object *o* is a Unicode object, but not an instance of a
65 subtype.
66
67 .. versionadded:: 2.2
68
69
Sandro Tosi98ed08f2012-01-14 16:42:02 +010070.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000071
Sandro Tosi98ed08f2012-01-14 16:42:02 +010072 Return the size of the object. *o* has to be a :c:type:`PyUnicodeObject` (not
Georg Brandlf6842722008-01-19 22:08:21 +000073 checked).
74
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000075 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +010076 This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000077 in your code for properly supporting 64-bit systems.
78
Georg Brandlf6842722008-01-19 22:08:21 +000079
Sandro Tosi98ed08f2012-01-14 16:42:02 +010080.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000081
82 Return the size of the object's internal buffer in bytes. *o* has to be a
Sandro Tosi98ed08f2012-01-14 16:42:02 +010083 :c:type:`PyUnicodeObject` (not checked).
Georg Brandlf6842722008-01-19 22:08:21 +000084
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000085 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +010086 This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000087 in your code for properly supporting 64-bit systems.
88
Georg Brandlf6842722008-01-19 22:08:21 +000089
Sandro Tosi98ed08f2012-01-14 16:42:02 +010090.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000091
Sandro Tosi98ed08f2012-01-14 16:42:02 +010092 Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object. *o*
93 has to be a :c:type:`PyUnicodeObject` (not checked).
Georg Brandlf6842722008-01-19 22:08:21 +000094
95
Sandro Tosi98ed08f2012-01-14 16:42:02 +010096.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
Georg Brandlf6842722008-01-19 22:08:21 +000097
98 Return a pointer to the internal buffer of the object. *o* has to be a
Sandro Tosi98ed08f2012-01-14 16:42:02 +010099 :c:type:`PyUnicodeObject` (not checked).
Georg Brandlf6842722008-01-19 22:08:21 +0000100
Christian Heimes3b718a72008-02-14 12:47:33 +0000101
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100102.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes3b718a72008-02-14 12:47:33 +0000103
104 Clear the free list. Return the total number of freed items.
105
106 .. versionadded:: 2.6
107
Georg Brandl36b30b52009-07-24 16:46:38 +0000108
Victor Stinner5f8aae02010-05-14 15:53:20 +0000109Unicode Character Properties
110""""""""""""""""""""""""""""
111
Georg Brandlf6842722008-01-19 22:08:21 +0000112Unicode provides many different character properties. The most often needed ones
113are available through these macros which are mapped to C functions depending on
114the Python configuration.
115
Georg Brandlf6842722008-01-19 22:08:21 +0000116
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100117.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000118
119 Return 1 or 0 depending on whether *ch* is a whitespace character.
120
121
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100122.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000123
124 Return 1 or 0 depending on whether *ch* is a lowercase character.
125
126
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100127.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000128
129 Return 1 or 0 depending on whether *ch* is an uppercase character.
130
131
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100132.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000133
134 Return 1 or 0 depending on whether *ch* is a titlecase character.
135
136
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100137.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000138
139 Return 1 or 0 depending on whether *ch* is a linebreak character.
140
141
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100142.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000143
144 Return 1 or 0 depending on whether *ch* is a decimal character.
145
146
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100147.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000148
149 Return 1 or 0 depending on whether *ch* is a digit character.
150
151
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100152.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000153
154 Return 1 or 0 depending on whether *ch* is a numeric character.
155
156
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100157.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000158
159 Return 1 or 0 depending on whether *ch* is an alphabetic character.
160
161
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100162.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000163
164 Return 1 or 0 depending on whether *ch* is an alphanumeric character.
165
166These APIs can be used for fast direct character conversions:
167
168
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100169.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000170
171 Return the character *ch* converted to lower case.
172
173
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100174.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000175
176 Return the character *ch* converted to upper case.
177
178
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100179.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000180
181 Return the character *ch* converted to title case.
182
183
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100184.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000185
186 Return the character *ch* converted to a decimal positive integer. Return
187 ``-1`` if this is not possible. This macro does not raise exceptions.
188
189
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100190.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000191
192 Return the character *ch* converted to a single digit integer. Return ``-1`` if
193 this is not possible. This macro does not raise exceptions.
194
195
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100196.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandlf6842722008-01-19 22:08:21 +0000197
198 Return the character *ch* converted to a double. Return ``-1.0`` if this is not
199 possible. This macro does not raise exceptions.
200
Victor Stinner5f8aae02010-05-14 15:53:20 +0000201
202Plain Py_UNICODE
203""""""""""""""""
204
Georg Brandlf6842722008-01-19 22:08:21 +0000205To create Unicode objects and access their basic sequence properties, use these
206APIs:
207
Georg Brandlf6842722008-01-19 22:08:21 +0000208
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100209.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000210
Georg Brandlb8d0e362010-11-26 07:53:50 +0000211 Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u*
Georg Brandlf6842722008-01-19 22:08:21 +0000212 may be *NULL* which causes the contents to be undefined. It is the user's
213 responsibility to fill in the needed data. The buffer is copied into the new
214 object. If the buffer is not *NULL*, the return value might be a shared object.
215 Therefore, modification of the resulting Unicode object is only allowed when *u*
216 is *NULL*.
217
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000218 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100219 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000220 changes in your code for properly supporting 64-bit systems.
221
Georg Brandlf6842722008-01-19 22:08:21 +0000222
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100223.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl79cdff02010-10-17 10:54:57 +0000224
Georg Brandlb8d0e362010-11-26 07:53:50 +0000225 Create a Unicode object from the char buffer *u*. The bytes will be interpreted
Georg Brandl79cdff02010-10-17 10:54:57 +0000226 as being UTF-8 encoded. *u* may also be *NULL* which
227 causes the contents to be undefined. It is the user's responsibility to fill in
228 the needed data. The buffer is copied into the new object. If the buffer is not
229 *NULL*, the return value might be a shared object. Therefore, modification of
230 the resulting Unicode object is only allowed when *u* is *NULL*.
231
232 .. versionadded:: 2.6
233
234
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100235.. c:function:: PyObject *PyUnicode_FromString(const char *u)
Georg Brandl79cdff02010-10-17 10:54:57 +0000236
237 Create a Unicode object from an UTF-8 encoded null-terminated char buffer
238 *u*.
239
240 .. versionadded:: 2.6
241
242
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100243.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl79cdff02010-10-17 10:54:57 +0000244
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100245 Take a C :c:func:`printf`\ -style *format* string and a variable number of
Georg Brandl79cdff02010-10-17 10:54:57 +0000246 arguments, calculate the size of the resulting Python unicode string and return
247 a string with the values formatted into it. The variable arguments must be C
248 types and must correspond exactly to the format characters in the *format*
249 string. The following format characters are allowed:
250
251 .. % The descriptions for %zd and %zu are wrong, but the truth is complicated
252 .. % because not all compilers support the %z width modifier -- we fake it
253 .. % when necessary via interpolating PY_FORMAT_SIZE_T.
254
Georg Brandl44ea77b2013-03-28 13:28:44 +0100255 .. tabularcolumns:: |l|l|L|
256
Georg Brandl79cdff02010-10-17 10:54:57 +0000257 +-------------------+---------------------+--------------------------------+
258 | Format Characters | Type | Comment |
259 +===================+=====================+================================+
260 | :attr:`%%` | *n/a* | The literal % character. |
261 +-------------------+---------------------+--------------------------------+
262 | :attr:`%c` | int | A single character, |
263 | | | represented as an C int. |
264 +-------------------+---------------------+--------------------------------+
265 | :attr:`%d` | int | Exactly equivalent to |
266 | | | ``printf("%d")``. |
267 +-------------------+---------------------+--------------------------------+
268 | :attr:`%u` | unsigned int | Exactly equivalent to |
269 | | | ``printf("%u")``. |
270 +-------------------+---------------------+--------------------------------+
271 | :attr:`%ld` | long | Exactly equivalent to |
272 | | | ``printf("%ld")``. |
273 +-------------------+---------------------+--------------------------------+
274 | :attr:`%lu` | unsigned long | Exactly equivalent to |
275 | | | ``printf("%lu")``. |
276 +-------------------+---------------------+--------------------------------+
277 | :attr:`%zd` | Py_ssize_t | Exactly equivalent to |
278 | | | ``printf("%zd")``. |
279 +-------------------+---------------------+--------------------------------+
280 | :attr:`%zu` | size_t | Exactly equivalent to |
281 | | | ``printf("%zu")``. |
282 +-------------------+---------------------+--------------------------------+
283 | :attr:`%i` | int | Exactly equivalent to |
284 | | | ``printf("%i")``. |
285 +-------------------+---------------------+--------------------------------+
286 | :attr:`%x` | int | Exactly equivalent to |
287 | | | ``printf("%x")``. |
288 +-------------------+---------------------+--------------------------------+
289 | :attr:`%s` | char\* | A null-terminated C character |
290 | | | array. |
291 +-------------------+---------------------+--------------------------------+
292 | :attr:`%p` | void\* | The hex representation of a C |
293 | | | pointer. Mostly equivalent to |
294 | | | ``printf("%p")`` except that |
295 | | | it is guaranteed to start with |
296 | | | the literal ``0x`` regardless |
297 | | | of what the platform's |
298 | | | ``printf`` yields. |
299 +-------------------+---------------------+--------------------------------+
300 | :attr:`%U` | PyObject\* | A unicode object. |
301 +-------------------+---------------------+--------------------------------+
302 | :attr:`%V` | PyObject\*, char \* | A unicode object (which may be |
303 | | | *NULL*) and a null-terminated |
304 | | | C character array as a second |
305 | | | parameter (which will be used, |
306 | | | if the first parameter is |
307 | | | *NULL*). |
308 +-------------------+---------------------+--------------------------------+
309 | :attr:`%S` | PyObject\* | The result of calling |
310 | | | :func:`PyObject_Unicode`. |
311 +-------------------+---------------------+--------------------------------+
312 | :attr:`%R` | PyObject\* | The result of calling |
313 | | | :func:`PyObject_Repr`. |
314 +-------------------+---------------------+--------------------------------+
315
316 An unrecognized format character causes all the rest of the format string to be
317 copied as-is to the result string, and any extra arguments discarded.
318
319 .. versionadded:: 2.6
320
321
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100322.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl79cdff02010-10-17 10:54:57 +0000323
324 Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
325 arguments.
326
327 .. versionadded:: 2.6
328
329
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100330.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000331
Victor Stinner28a545e2011-12-18 19:39:53 +0100332 Return a read-only pointer to the Unicode object's internal
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100333 :c:type:`Py_UNICODE` buffer, *NULL* if *unicode* is not a Unicode object.
334 Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded
Victor Stinner28a545e2011-12-18 19:39:53 +0100335 null characters, which would cause the string to be truncated when used in
336 most C functions.
Georg Brandlf6842722008-01-19 22:08:21 +0000337
338
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100339.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000340
341 Return the length of the Unicode object.
342
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000343 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100344 This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000345 in your code for properly supporting 64-bit systems.
346
Georg Brandlf6842722008-01-19 22:08:21 +0000347
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100348.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000349
350 Coerce an encoded object *obj* to an Unicode object and return a reference with
351 incremented refcount.
352
353 String and other char buffer compatible objects are decoded according to the
354 given encoding and using the error handling defined by errors. Both can be
355 *NULL* to have the interface use the default values (see the next section for
356 details).
357
358 All other objects, including Unicode objects, cause a :exc:`TypeError` to be
359 set.
360
361 The API returns *NULL* if there was an error. The caller is responsible for
362 decref'ing the returned objects.
363
364
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100365.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandlf6842722008-01-19 22:08:21 +0000366
367 Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
368 throughout the interpreter whenever coercion to Unicode is needed.
369
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100370If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
Georg Brandlf6842722008-01-19 22:08:21 +0000371Python can interface directly to this type using the following functions.
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100372Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
373the system's :c:type:`wchar_t`.
Georg Brandlf6842722008-01-19 22:08:21 +0000374
Georg Brandlf6842722008-01-19 22:08:21 +0000375
Victor Stinner5f8aae02010-05-14 15:53:20 +0000376wchar_t Support
377"""""""""""""""
378
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100379:c:type:`wchar_t` support for platforms which support it:
Georg Brandlf6842722008-01-19 22:08:21 +0000380
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100381.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000382
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100383 Create a Unicode object from the :c:type:`wchar_t` buffer *w* of the given *size*.
Georg Brandlf6842722008-01-19 22:08:21 +0000384 Return *NULL* on failure.
385
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000386 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100387 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000388 changes in your code for properly supporting 64-bit systems.
389
Georg Brandlf6842722008-01-19 22:08:21 +0000390
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100391.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *unicode, wchar_t *w, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000392
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100393 Copy the Unicode object contents into the :c:type:`wchar_t` buffer *w*. At most
394 *size* :c:type:`wchar_t` characters are copied (excluding a possibly trailing
395 0-termination character). Return the number of :c:type:`wchar_t` characters
396 copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t`
Georg Brandlf6842722008-01-19 22:08:21 +0000397 string may or may not be 0-terminated. It is the responsibility of the caller
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100398 to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is
399 required by the application. Also, note that the :c:type:`wchar_t*` string
Victor Stinner28a545e2011-12-18 19:39:53 +0100400 might contain null characters, which would cause the string to be truncated
401 when used with most C functions.
Georg Brandlf6842722008-01-19 22:08:21 +0000402
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000403 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100404 This function returned an :c:type:`int` type and used an :c:type:`int`
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000405 type for *size*. This might require changes in your code for properly
406 supporting 64-bit systems.
407
Georg Brandlf6842722008-01-19 22:08:21 +0000408
409.. _builtincodecs:
410
411Built-in Codecs
412^^^^^^^^^^^^^^^
413
Georg Brandld7d4fd72009-07-26 14:37:28 +0000414Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandlf6842722008-01-19 22:08:21 +0000415these codecs are directly usable via the following functions.
416
Ezio Melotti020f6502011-04-14 07:39:06 +0300417Many of the following APIs take two arguments encoding and errors, and they
418have the same semantics as the ones of the built-in :func:`unicode` Unicode
419object constructor.
Georg Brandlf6842722008-01-19 22:08:21 +0000420
421Setting encoding to *NULL* causes the default encoding to be used which is
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100422ASCII. The file system calls should use :c:data:`Py_FileSystemDefaultEncoding`
Ezio Melotti020f6502011-04-14 07:39:06 +0300423as the encoding for file names. This variable should be treated as read-only: on
Georg Brandlf6842722008-01-19 22:08:21 +0000424some systems, it will be a pointer to a static string, on others, it will change
425at run-time (such as when the application invokes setlocale).
426
427Error handling is set by errors which may also be set to *NULL* meaning to use
428the default handling defined for the codec. Default error handling for all
Georg Brandld7d4fd72009-07-26 14:37:28 +0000429built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandlf6842722008-01-19 22:08:21 +0000430
431The codecs all use a similar interface. Only deviation from the following
432generic ones are documented for simplicity.
433
Georg Brandlf6842722008-01-19 22:08:21 +0000434
Victor Stinner5f8aae02010-05-14 15:53:20 +0000435Generic Codecs
436""""""""""""""
437
438These are the generic codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000439
440
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100441.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000442
443 Create a Unicode object by decoding *size* bytes of the encoded string *s*.
444 *encoding* and *errors* have the same meaning as the parameters of the same name
Georg Brandld7d4fd72009-07-26 14:37:28 +0000445 in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandlf6842722008-01-19 22:08:21 +0000446 using the Python codec registry. Return *NULL* if an exception was raised by
447 the codec.
448
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000449 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100450 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000451 changes in your code for properly supporting 64-bit systems.
452
Georg Brandlf6842722008-01-19 22:08:21 +0000453
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100454.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000455
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100456 Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* and return a Python
Georg Brandlf6842722008-01-19 22:08:21 +0000457 string object. *encoding* and *errors* have the same meaning as the parameters
Serhiy Storchaka99a196f2013-10-09 13:25:21 +0300458 of the same name in the Unicode :meth:`~unicode.encode` method. The codec
459 to be used is looked up using the Python codec registry. Return *NULL* if
460 an exception was raised by the codec.
Georg Brandlf6842722008-01-19 22:08:21 +0000461
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000462 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100463 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000464 changes in your code for properly supporting 64-bit systems.
465
Georg Brandlf6842722008-01-19 22:08:21 +0000466
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100467.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000468
469 Encode a Unicode object and return the result as Python string object.
470 *encoding* and *errors* have the same meaning as the parameters of the same name
471 in the Unicode :meth:`encode` method. The codec to be used is looked up using
472 the Python codec registry. Return *NULL* if an exception was raised by the
473 codec.
474
Georg Brandlf6842722008-01-19 22:08:21 +0000475
Victor Stinner5f8aae02010-05-14 15:53:20 +0000476UTF-8 Codecs
477""""""""""""
478
479These are the UTF-8 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000480
481
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100482.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000483
484 Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
485 *s*. Return *NULL* if an exception was raised by the codec.
486
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000487 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100488 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000489 changes in your code for properly supporting 64-bit systems.
490
Georg Brandlf6842722008-01-19 22:08:21 +0000491
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100492.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000493
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100494 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandlf6842722008-01-19 22:08:21 +0000495 *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
496 treated as an error. Those bytes will not be decoded and the number of bytes
497 that have been decoded will be stored in *consumed*.
498
499 .. versionadded:: 2.4
500
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000501 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100502 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000503 changes in your code for properly supporting 64-bit systems.
504
Georg Brandlf6842722008-01-19 22:08:21 +0000505
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100506.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000507
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100508 Encode the :c:type:`Py_UNICODE` buffer *s* of the given *size* using UTF-8 and return a
Georg Brandlf6842722008-01-19 22:08:21 +0000509 Python string object. Return *NULL* if an exception was raised by the codec.
510
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000511 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100512 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000513 changes in your code for properly supporting 64-bit systems.
514
Georg Brandlf6842722008-01-19 22:08:21 +0000515
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100516.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000517
518 Encode a Unicode object using UTF-8 and return the result as Python string
519 object. Error handling is "strict". Return *NULL* if an exception was raised
520 by the codec.
521
Georg Brandlf6842722008-01-19 22:08:21 +0000522
Victor Stinner5f8aae02010-05-14 15:53:20 +0000523UTF-32 Codecs
524"""""""""""""
525
526These are the UTF-32 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000527
528
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100529.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000530
Ezio Melotti020f6502011-04-14 07:39:06 +0300531 Decode *size* bytes from a UTF-32 encoded buffer string and return the
Georg Brandlf6842722008-01-19 22:08:21 +0000532 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
533 handling. It defaults to "strict".
534
535 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
536 order::
537
538 *byteorder == -1: little endian
539 *byteorder == 0: native order
540 *byteorder == 1: big endian
541
Georg Brandl579a3582009-09-18 21:35:59 +0000542 If ``*byteorder`` is zero, and the first four bytes of the input data are a
543 byte order mark (BOM), the decoder switches to this byte order and the BOM is
544 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
545 ``1``, any byte order mark is copied to the output.
546
547 After completion, *\*byteorder* is set to the current byte order at the end
548 of input data.
Georg Brandlf6842722008-01-19 22:08:21 +0000549
Georg Brandla44ec3f2015-01-14 08:26:30 +0100550 In a narrow build code points outside the BMP will be decoded as surrogate pairs.
Georg Brandlf6842722008-01-19 22:08:21 +0000551
552 If *byteorder* is *NULL*, the codec starts in native order mode.
553
554 Return *NULL* if an exception was raised by the codec.
555
556 .. versionadded:: 2.6
557
558
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100559.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000560
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100561 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF32`. If
562 *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandlf6842722008-01-19 22:08:21 +0000563 trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
564 by four) as an error. Those bytes will not be decoded and the number of bytes
565 that have been decoded will be stored in *consumed*.
566
567 .. versionadded:: 2.6
568
569
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100570.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000571
572 Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl579a3582009-09-18 21:35:59 +0000573 data in *s*. Output is written according to the following byte order::
Georg Brandlf6842722008-01-19 22:08:21 +0000574
575 byteorder == -1: little endian
576 byteorder == 0: native byte order (writes a BOM mark)
577 byteorder == 1: big endian
578
579 If byteorder is ``0``, the output string will always start with the Unicode BOM
580 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
581
582 If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
Georg Brandla44ec3f2015-01-14 08:26:30 +0100583 as a single code point.
Georg Brandlf6842722008-01-19 22:08:21 +0000584
585 Return *NULL* if an exception was raised by the codec.
586
587 .. versionadded:: 2.6
588
589
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100590.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000591
592 Return a Python string using the UTF-32 encoding in native byte order. The
593 string always starts with a BOM mark. Error handling is "strict". Return
594 *NULL* if an exception was raised by the codec.
595
596 .. versionadded:: 2.6
597
598
Victor Stinner5f8aae02010-05-14 15:53:20 +0000599UTF-16 Codecs
600"""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000601
Victor Stinner5f8aae02010-05-14 15:53:20 +0000602These are the UTF-16 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000603
604
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100605.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000606
Ezio Melotti020f6502011-04-14 07:39:06 +0300607 Decode *size* bytes from a UTF-16 encoded buffer string and return the
Georg Brandlf6842722008-01-19 22:08:21 +0000608 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
609 handling. It defaults to "strict".
610
611 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
612 order::
613
614 *byteorder == -1: little endian
615 *byteorder == 0: native order
616 *byteorder == 1: big endian
617
Georg Brandl579a3582009-09-18 21:35:59 +0000618 If ``*byteorder`` is zero, and the first two bytes of the input data are a
619 byte order mark (BOM), the decoder switches to this byte order and the BOM is
620 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
621 ``1``, any byte order mark is copied to the output (where it will result in
622 either a ``\ufeff`` or a ``\ufffe`` character).
623
624 After completion, *\*byteorder* is set to the current byte order at the end
625 of input data.
Georg Brandlf6842722008-01-19 22:08:21 +0000626
627 If *byteorder* is *NULL*, the codec starts in native order mode.
628
629 Return *NULL* if an exception was raised by the codec.
630
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000631 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100632 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000633 changes in your code for properly supporting 64-bit systems.
634
Georg Brandlf6842722008-01-19 22:08:21 +0000635
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100636.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000637
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100638 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF16`. If
639 *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandlf6842722008-01-19 22:08:21 +0000640 trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
641 split surrogate pair) as an error. Those bytes will not be decoded and the
642 number of bytes that have been decoded will be stored in *consumed*.
643
644 .. versionadded:: 2.4
645
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000646 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100647 This function used an :c:type:`int` type for *size* and an :c:type:`int *`
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000648 type for *consumed*. This might require changes in your code for
649 properly supporting 64-bit systems.
650
Georg Brandlf6842722008-01-19 22:08:21 +0000651
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100652.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
Georg Brandlf6842722008-01-19 22:08:21 +0000653
654 Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl579a3582009-09-18 21:35:59 +0000655 data in *s*. Output is written according to the following byte order::
Georg Brandlf6842722008-01-19 22:08:21 +0000656
657 byteorder == -1: little endian
658 byteorder == 0: native byte order (writes a BOM mark)
659 byteorder == 1: big endian
660
661 If byteorder is ``0``, the output string will always start with the Unicode BOM
662 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
663
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100664 If *Py_UNICODE_WIDE* is defined, a single :c:type:`Py_UNICODE` value may get
665 represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Georg Brandlf6842722008-01-19 22:08:21 +0000666 values is interpreted as an UCS-2 character.
667
668 Return *NULL* if an exception was raised by the codec.
669
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000670 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100671 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000672 changes in your code for properly supporting 64-bit systems.
673
Georg Brandlf6842722008-01-19 22:08:21 +0000674
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100675.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000676
677 Return a Python string using the UTF-16 encoding in native byte order. The
678 string always starts with a BOM mark. Error handling is "strict". Return
679 *NULL* if an exception was raised by the codec.
680
Georg Brandlf6842722008-01-19 22:08:21 +0000681
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000682UTF-7 Codecs
683""""""""""""
684
685These are the UTF-7 codec APIs:
686
687
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100688.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000689
690 Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
691 *s*. Return *NULL* if an exception was raised by the codec.
692
693
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100694.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000695
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100696 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000697 *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
698 be treated as an error. Those bytes will not be decoded and the number of
699 bytes that have been decoded will be stored in *consumed*.
700
701
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100702.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000703
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100704 Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000705 return a Python bytes object. Return *NULL* if an exception was raised by
706 the codec.
707
708 If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
709 special meaning) will be encoded in base-64. If *base64WhiteSpace* is
710 nonzero, whitespace will be encoded in base-64. Both are set to zero for the
711 Python "utf-7" codec.
712
713
Victor Stinner5f8aae02010-05-14 15:53:20 +0000714Unicode-Escape Codecs
715"""""""""""""""""""""
716
717These are the "Unicode Escape" codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000718
719
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100720.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000721
722 Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
723 string *s*. Return *NULL* if an exception was raised by the codec.
724
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000725 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100726 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000727 changes in your code for properly supporting 64-bit systems.
728
Georg Brandlf6842722008-01-19 22:08:21 +0000729
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100730.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandlf6842722008-01-19 22:08:21 +0000731
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100732 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Unicode-Escape and
Georg Brandlf6842722008-01-19 22:08:21 +0000733 return a Python string object. Return *NULL* if an exception was raised by the
734 codec.
735
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000736 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100737 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000738 changes in your code for properly supporting 64-bit systems.
739
Georg Brandlf6842722008-01-19 22:08:21 +0000740
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100741.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000742
743 Encode a Unicode object using Unicode-Escape and return the result as Python
744 string object. Error handling is "strict". Return *NULL* if an exception was
745 raised by the codec.
746
Georg Brandlf6842722008-01-19 22:08:21 +0000747
Victor Stinner5f8aae02010-05-14 15:53:20 +0000748Raw-Unicode-Escape Codecs
749"""""""""""""""""""""""""
750
751These are the "Raw Unicode Escape" codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000752
753
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100754.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000755
756 Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
757 encoded string *s*. Return *NULL* if an exception was raised by the codec.
758
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000759 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100760 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000761 changes in your code for properly supporting 64-bit systems.
762
Georg Brandlf6842722008-01-19 22:08:21 +0000763
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100764.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000765
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100766 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Raw-Unicode-Escape
Georg Brandlf6842722008-01-19 22:08:21 +0000767 and return a Python string object. Return *NULL* if an exception was raised by
768 the codec.
769
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000770 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100771 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000772 changes in your code for properly supporting 64-bit systems.
773
Georg Brandlf6842722008-01-19 22:08:21 +0000774
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100775.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000776
777 Encode a Unicode object using Raw-Unicode-Escape and return the result as
778 Python string object. Error handling is "strict". Return *NULL* if an exception
779 was raised by the codec.
780
Victor Stinner5f8aae02010-05-14 15:53:20 +0000781
782Latin-1 Codecs
783""""""""""""""
784
Georg Brandlf6842722008-01-19 22:08:21 +0000785These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
786ordinals and only these are accepted by the codecs during encoding.
787
Georg Brandlf6842722008-01-19 22:08:21 +0000788
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100789.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000790
791 Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
792 *s*. Return *NULL* if an exception was raised by the codec.
793
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000794 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100795 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000796 changes in your code for properly supporting 64-bit systems.
797
Georg Brandlf6842722008-01-19 22:08:21 +0000798
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100799.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000800
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100801 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using Latin-1 and return
Georg Brandlf6842722008-01-19 22:08:21 +0000802 a Python string object. Return *NULL* if an exception was raised by the codec.
803
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000804 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100805 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000806 changes in your code for properly supporting 64-bit systems.
807
Georg Brandlf6842722008-01-19 22:08:21 +0000808
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100809.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000810
811 Encode a Unicode object using Latin-1 and return the result as Python string
812 object. Error handling is "strict". Return *NULL* if an exception was raised
813 by the codec.
814
Victor Stinner5f8aae02010-05-14 15:53:20 +0000815
816ASCII Codecs
817""""""""""""
818
Georg Brandlf6842722008-01-19 22:08:21 +0000819These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
820codes generate errors.
821
Georg Brandlf6842722008-01-19 22:08:21 +0000822
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100823.. c:function:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000824
825 Create a Unicode object by decoding *size* bytes of the ASCII encoded string
826 *s*. Return *NULL* if an exception was raised by the codec.
827
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000828 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100829 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000830 changes in your code for properly supporting 64-bit systems.
831
Georg Brandlf6842722008-01-19 22:08:21 +0000832
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100833.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000834
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100835 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using ASCII and return a
Georg Brandlf6842722008-01-19 22:08:21 +0000836 Python string object. Return *NULL* if an exception was raised by the codec.
837
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000838 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100839 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000840 changes in your code for properly supporting 64-bit systems.
841
Georg Brandlf6842722008-01-19 22:08:21 +0000842
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100843.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000844
845 Encode a Unicode object using ASCII and return the result as Python string
846 object. Error handling is "strict". Return *NULL* if an exception was raised
847 by the codec.
848
Georg Brandlf6842722008-01-19 22:08:21 +0000849
Victor Stinner5f8aae02010-05-14 15:53:20 +0000850Character Map Codecs
851""""""""""""""""""""
852
Georg Brandlf6842722008-01-19 22:08:21 +0000853This codec is special in that it can be used to implement many different codecs
854(and this is in fact what was done to obtain most of the standard codecs
855included in the :mod:`encodings` package). The codec uses mapping to encode and
856decode characters.
857
858Decoding mappings must map single string characters to single Unicode
859characters, integers (which are then interpreted as Unicode ordinals) or None
860(meaning "undefined mapping" and causing an error).
861
862Encoding mappings must map single Unicode characters to single string
863characters, integers (which are then interpreted as Latin-1 ordinals) or None
864(meaning "undefined mapping" and causing an error).
865
866The mapping objects provided must only support the __getitem__ mapping
867interface.
868
869If a character lookup fails with a LookupError, the character is copied as-is
870meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
871resp. Because of this, mappings only need to contain those mappings which map
872characters to different code points.
873
Ezio Melotti020f6502011-04-14 07:39:06 +0300874These are the mapping codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000875
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100876.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000877
878 Create a Unicode object by decoding *size* bytes of the encoded string *s* using
879 the given *mapping* object. Return *NULL* if an exception was raised by the
880 codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
881 dictionary mapping byte or a unicode string, which is treated as a lookup table.
882 Byte values greater that the length of the string and U+FFFE "characters" are
883 treated as "undefined mapping".
884
885 .. versionchanged:: 2.4
886 Allowed unicode string as mapping argument.
887
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000888 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100889 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000890 changes in your code for properly supporting 64-bit systems.
891
Georg Brandlf6842722008-01-19 22:08:21 +0000892
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100893.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000894
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100895 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using the given
Georg Brandlf6842722008-01-19 22:08:21 +0000896 *mapping* object and return a Python string object. Return *NULL* if an
897 exception was raised by the codec.
898
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000899 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100900 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000901 changes in your code for properly supporting 64-bit systems.
902
Georg Brandlf6842722008-01-19 22:08:21 +0000903
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100904.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
Georg Brandlf6842722008-01-19 22:08:21 +0000905
906 Encode a Unicode object using the given *mapping* object and return the result
907 as Python string object. Error handling is "strict". Return *NULL* if an
908 exception was raised by the codec.
909
910The following codec API is special in that maps Unicode to Unicode.
911
912
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100913.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000914
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100915 Translate a :c:type:`Py_UNICODE` buffer of the given *size* by applying a
Georg Brandlf6842722008-01-19 22:08:21 +0000916 character mapping *table* to it and return the resulting Unicode object. Return
917 *NULL* when an exception was raised by the codec.
918
919 The *mapping* table must map Unicode ordinal integers to Unicode ordinal
920 integers or None (causing deletion of the character).
921
922 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
923 and sequences work well. Unmapped character ordinals (ones which cause a
924 :exc:`LookupError`) are left untouched and are copied as-is.
925
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000926 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100927 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000928 changes in your code for properly supporting 64-bit systems.
929
Ezio Melotti020f6502011-04-14 07:39:06 +0300930
931MBCS codecs for Windows
932"""""""""""""""""""""""
933
Georg Brandlf6842722008-01-19 22:08:21 +0000934These are the MBCS codec APIs. They are currently only available on Windows and
935use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
936DBCS) is a class of encodings, not just one. The target encoding is defined by
937the user settings on the machine running the codec.
938
Victor Stinner5f8aae02010-05-14 15:53:20 +0000939
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100940.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000941
942 Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
943 Return *NULL* if an exception was raised by the codec.
944
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000945 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100946 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000947 changes in your code for properly supporting 64-bit systems.
948
Georg Brandlf6842722008-01-19 22:08:21 +0000949
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100950.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
Georg Brandlf6842722008-01-19 22:08:21 +0000951
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100952 If *consumed* is *NULL*, behave like :c:func:`PyUnicode_DecodeMBCS`. If
953 *consumed* is not *NULL*, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandlf6842722008-01-19 22:08:21 +0000954 trailing lead byte and the number of bytes that have been decoded will be stored
955 in *consumed*.
956
957 .. versionadded:: 2.5
958
959
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100960.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +0000961
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100962 Encode the :c:type:`Py_UNICODE` buffer of the given *size* using MBCS and return a
Georg Brandlf6842722008-01-19 22:08:21 +0000963 Python string object. Return *NULL* if an exception was raised by the codec.
964
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000965 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100966 This function used an :c:type:`int` type for *size*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000967 changes in your code for properly supporting 64-bit systems.
968
Georg Brandlf6842722008-01-19 22:08:21 +0000969
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100970.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
Georg Brandlf6842722008-01-19 22:08:21 +0000971
972 Encode a Unicode object using MBCS and return the result as Python string
973 object. Error handling is "strict". Return *NULL* if an exception was raised
974 by the codec.
975
Georg Brandlf6842722008-01-19 22:08:21 +0000976
Victor Stinner5f8aae02010-05-14 15:53:20 +0000977Methods & Slots
978"""""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000979
980.. _unicodemethodsandslots:
981
982Methods and Slot Functions
983^^^^^^^^^^^^^^^^^^^^^^^^^^
984
985The following APIs are capable of handling Unicode objects and strings on input
986(we refer to them as strings in the descriptions) and return Unicode objects or
987integers as appropriate.
988
989They all return *NULL* or ``-1`` if an exception occurs.
990
991
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100992.. c:function:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
Georg Brandlf6842722008-01-19 22:08:21 +0000993
994 Concat two strings giving a new Unicode string.
995
996
Sandro Tosi98ed08f2012-01-14 16:42:02 +0100997.. c:function:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
Georg Brandlf6842722008-01-19 22:08:21 +0000998
Ezio Melotti020f6502011-04-14 07:39:06 +0300999 Split a string giving a list of Unicode strings. If *sep* is *NULL*, splitting
Georg Brandlf6842722008-01-19 22:08:21 +00001000 will be done at all whitespace substrings. Otherwise, splits occur at the given
1001 separator. At most *maxsplit* splits will be done. If negative, no limit is
1002 set. Separators are not included in the resulting list.
1003
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001004 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001005 This function used an :c:type:`int` type for *maxsplit*. This might require
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001006 changes in your code for properly supporting 64-bit systems.
1007
Georg Brandlf6842722008-01-19 22:08:21 +00001008
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001009.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandlf6842722008-01-19 22:08:21 +00001010
1011 Split a Unicode string at line breaks, returning a list of Unicode strings.
1012 CRLF is considered to be one line break. If *keepend* is 0, the Line break
1013 characters are not included in the resulting strings.
1014
1015
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001016.. c:function:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
Georg Brandlf6842722008-01-19 22:08:21 +00001017
1018 Translate a string by applying a character mapping table to it and return the
1019 resulting Unicode object.
1020
1021 The mapping table must map Unicode ordinal integers to Unicode ordinal integers
1022 or None (causing deletion of the character).
1023
1024 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
1025 and sequences work well. Unmapped character ordinals (ones which cause a
1026 :exc:`LookupError`) are left untouched and are copied as-is.
1027
1028 *errors* has the usual meaning for codecs. It may be *NULL* which indicates to
1029 use the default error handling.
1030
1031
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001032.. c:function:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
Georg Brandlf6842722008-01-19 22:08:21 +00001033
Ezio Melotti020f6502011-04-14 07:39:06 +03001034 Join a sequence of strings using the given *separator* and return the resulting
Georg Brandlf6842722008-01-19 22:08:21 +00001035 Unicode string.
1036
1037
Victor Stinnera6066ce2014-10-09 11:14:04 +02001038.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandlf6842722008-01-19 22:08:21 +00001039
Ezio Melotti020f6502011-04-14 07:39:06 +03001040 Return 1 if *substr* matches ``str[start:end]`` at the given tail end
Georg Brandlf6842722008-01-19 22:08:21 +00001041 (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
1042 0 otherwise. Return ``-1`` if an error occurred.
1043
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001044 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001045 This function used an :c:type:`int` type for *start* and *end*. This
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001046 might require changes in your code for properly supporting 64-bit
1047 systems.
1048
Georg Brandlf6842722008-01-19 22:08:21 +00001049
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001050.. c:function:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandlf6842722008-01-19 22:08:21 +00001051
Ezio Melotti020f6502011-04-14 07:39:06 +03001052 Return the first position of *substr* in ``str[start:end]`` using the given
Georg Brandlf6842722008-01-19 22:08:21 +00001053 *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
1054 backward search). The return value is the index of the first match; a value of
1055 ``-1`` indicates that no match was found, and ``-2`` indicates that an error
1056 occurred and an exception has been set.
1057
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001058 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001059 This function used an :c:type:`int` type for *start* and *end*. This
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001060 might require changes in your code for properly supporting 64-bit
1061 systems.
1062
Georg Brandlf6842722008-01-19 22:08:21 +00001063
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001064.. c:function:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
Georg Brandlf6842722008-01-19 22:08:21 +00001065
1066 Return the number of non-overlapping occurrences of *substr* in
1067 ``str[start:end]``. Return ``-1`` if an error occurred.
1068
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001069 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001070 This function returned an :c:type:`int` type and used an :c:type:`int`
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001071 type for *start* and *end*. This might require changes in your code for
1072 properly supporting 64-bit systems.
1073
Georg Brandlf6842722008-01-19 22:08:21 +00001074
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001075.. c:function:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
Georg Brandlf6842722008-01-19 22:08:21 +00001076
1077 Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
1078 return the resulting Unicode object. *maxcount* == -1 means replace all
1079 occurrences.
1080
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001081 .. versionchanged:: 2.5
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001082 This function used an :c:type:`int` type for *maxcount*. This might
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +00001083 require changes in your code for properly supporting 64-bit systems.
1084
Georg Brandlf6842722008-01-19 22:08:21 +00001085
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001086.. c:function:: int PyUnicode_Compare(PyObject *left, PyObject *right)
Georg Brandlf6842722008-01-19 22:08:21 +00001087
1088 Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
1089 respectively.
1090
1091
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001092.. c:function:: int PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
Georg Brandlf6842722008-01-19 22:08:21 +00001093
1094 Rich compare two unicode strings and return one of the following:
1095
1096 * ``NULL`` in case an exception was raised
1097 * :const:`Py_True` or :const:`Py_False` for successful comparisons
1098 * :const:`Py_NotImplemented` in case the type combination is unknown
1099
1100 Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
1101 :exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
1102 with a :exc:`UnicodeDecodeError`.
1103
1104 Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
1105 :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
1106
1107
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001108.. c:function:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
Georg Brandlf6842722008-01-19 22:08:21 +00001109
1110 Return a new string object from *format* and *args*; this is analogous to
Benjamin Peterson8f257622014-07-19 16:34:33 -07001111 ``format % args``.
Georg Brandlf6842722008-01-19 22:08:21 +00001112
1113
Sandro Tosi98ed08f2012-01-14 16:42:02 +01001114.. c:function:: int PyUnicode_Contains(PyObject *container, PyObject *element)
Georg Brandlf6842722008-01-19 22:08:21 +00001115
1116 Check whether *element* is contained in *container* and return true or false
1117 accordingly.
1118
1119 *element* has to coerce to a one element Unicode string. ``-1`` is returned if
1120 there was an error.