blob: 0288271c5b9e20651e58a92524c849f56d0e0779 [file] [log] [blame]
Georg Brandlf6842722008-01-19 22:08:21 +00001.. highlightlang:: c
2
3.. _unicodeobjects:
4
5Unicode Objects and Codecs
6--------------------------
7
8.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
9
10Unicode Objects
11^^^^^^^^^^^^^^^
12
13
Victor Stinner5f8aae02010-05-14 15:53:20 +000014Unicode Type
15""""""""""""
16
Georg Brandlf6842722008-01-19 22:08:21 +000017These are the basic Unicode object types used for the Unicode implementation in
18Python:
19
Georg Brandlf6842722008-01-19 22:08:21 +000020
21.. ctype:: Py_UNICODE
22
23 This type represents the storage type which is used by Python internally as
24 basis for holding Unicode ordinals. Python's default builds use a 16-bit type
25 for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
26 possible to build a UCS4 version of Python (most recent Linux distributions come
27 with UCS4 builds of Python). These builds then use a 32-bit type for
28 :ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
29 where :ctype:`wchar_t` is available and compatible with the chosen Python
30 Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
31 :ctype:`wchar_t` to enhance native platform compatibility. On all other
32 platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
33 short` (UCS2) or :ctype:`unsigned long` (UCS4).
34
35Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
36this in mind when writing extensions or interfaces.
37
38
39.. ctype:: PyUnicodeObject
40
41 This subtype of :ctype:`PyObject` represents a Python Unicode object.
42
43
44.. cvar:: PyTypeObject PyUnicode_Type
45
46 This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
47 is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
48
49The following APIs are really C macros and can be used to do fast checks and to
50access internal read-only data of Unicode objects:
51
52
53.. cfunction:: int PyUnicode_Check(PyObject *o)
54
55 Return true if the object *o* is a Unicode object or an instance of a Unicode
56 subtype.
57
58 .. versionchanged:: 2.2
59 Allowed subtypes to be accepted.
60
61
62.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
63
64 Return true if the object *o* is a Unicode object, but not an instance of a
65 subtype.
66
67 .. versionadded:: 2.2
68
69
70.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
71
72 Return the size of the object. *o* has to be a :ctype:`PyUnicodeObject` (not
73 checked).
74
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000075 .. versionchanged:: 2.5
76 This function returned an :ctype:`int` type. This might require changes
77 in your code for properly supporting 64-bit systems.
78
Georg Brandlf6842722008-01-19 22:08:21 +000079
80.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
81
82 Return the size of the object's internal buffer in bytes. *o* has to be a
83 :ctype:`PyUnicodeObject` (not checked).
84
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +000085 .. versionchanged:: 2.5
86 This function returned an :ctype:`int` type. This might require changes
87 in your code for properly supporting 64-bit systems.
88
Georg Brandlf6842722008-01-19 22:08:21 +000089
90.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
91
92 Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. *o*
93 has to be a :ctype:`PyUnicodeObject` (not checked).
94
95
96.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
97
98 Return a pointer to the internal buffer of the object. *o* has to be a
99 :ctype:`PyUnicodeObject` (not checked).
100
Christian Heimes3b718a72008-02-14 12:47:33 +0000101
Georg Brandl36b30b52009-07-24 16:46:38 +0000102.. cfunction:: int PyUnicode_ClearFreeList()
Christian Heimes3b718a72008-02-14 12:47:33 +0000103
104 Clear the free list. Return the total number of freed items.
105
106 .. versionadded:: 2.6
107
Georg Brandl36b30b52009-07-24 16:46:38 +0000108
Victor Stinner5f8aae02010-05-14 15:53:20 +0000109Unicode Character Properties
110""""""""""""""""""""""""""""
111
Georg Brandlf6842722008-01-19 22:08:21 +0000112Unicode provides many different character properties. The most often needed ones
113are available through these macros which are mapped to C functions depending on
114the Python configuration.
115
Georg Brandlf6842722008-01-19 22:08:21 +0000116
117.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
118
119 Return 1 or 0 depending on whether *ch* is a whitespace character.
120
121
122.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
123
124 Return 1 or 0 depending on whether *ch* is a lowercase character.
125
126
127.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
128
129 Return 1 or 0 depending on whether *ch* is an uppercase character.
130
131
132.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
133
134 Return 1 or 0 depending on whether *ch* is a titlecase character.
135
136
137.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
138
139 Return 1 or 0 depending on whether *ch* is a linebreak character.
140
141
142.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
143
144 Return 1 or 0 depending on whether *ch* is a decimal character.
145
146
147.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
148
149 Return 1 or 0 depending on whether *ch* is a digit character.
150
151
152.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
153
154 Return 1 or 0 depending on whether *ch* is a numeric character.
155
156
157.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
158
159 Return 1 or 0 depending on whether *ch* is an alphabetic character.
160
161
162.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
163
164 Return 1 or 0 depending on whether *ch* is an alphanumeric character.
165
166These APIs can be used for fast direct character conversions:
167
168
169.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
170
171 Return the character *ch* converted to lower case.
172
173
174.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
175
176 Return the character *ch* converted to upper case.
177
178
179.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
180
181 Return the character *ch* converted to title case.
182
183
184.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
185
186 Return the character *ch* converted to a decimal positive integer. Return
187 ``-1`` if this is not possible. This macro does not raise exceptions.
188
189
190.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
191
192 Return the character *ch* converted to a single digit integer. Return ``-1`` if
193 this is not possible. This macro does not raise exceptions.
194
195
196.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
197
198 Return the character *ch* converted to a double. Return ``-1.0`` if this is not
199 possible. This macro does not raise exceptions.
200
Victor Stinner5f8aae02010-05-14 15:53:20 +0000201
202Plain Py_UNICODE
203""""""""""""""""
204
Georg Brandlf6842722008-01-19 22:08:21 +0000205To create Unicode objects and access their basic sequence properties, use these
206APIs:
207
Georg Brandlf6842722008-01-19 22:08:21 +0000208
209.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
210
211 Create a Unicode Object from the Py_UNICODE buffer *u* of the given size. *u*
212 may be *NULL* which causes the contents to be undefined. It is the user's
213 responsibility to fill in the needed data. The buffer is copied into the new
214 object. If the buffer is not *NULL*, the return value might be a shared object.
215 Therefore, modification of the resulting Unicode object is only allowed when *u*
216 is *NULL*.
217
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000218 .. versionchanged:: 2.5
219 This function used an :ctype:`int` type for *size*. This might require
220 changes in your code for properly supporting 64-bit systems.
221
Georg Brandlf6842722008-01-19 22:08:21 +0000222
223.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
224
225 Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
226 buffer, *NULL* if *unicode* is not a Unicode object.
227
228
229.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
230
231 Return the length of the Unicode object.
232
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000233 .. versionchanged:: 2.5
234 This function returned an :ctype:`int` type. This might require changes
235 in your code for properly supporting 64-bit systems.
236
Georg Brandlf6842722008-01-19 22:08:21 +0000237
238.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
239
240 Coerce an encoded object *obj* to an Unicode object and return a reference with
241 incremented refcount.
242
243 String and other char buffer compatible objects are decoded according to the
244 given encoding and using the error handling defined by errors. Both can be
245 *NULL* to have the interface use the default values (see the next section for
246 details).
247
248 All other objects, including Unicode objects, cause a :exc:`TypeError` to be
249 set.
250
251 The API returns *NULL* if there was an error. The caller is responsible for
252 decref'ing the returned objects.
253
254
255.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
256
257 Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
258 throughout the interpreter whenever coercion to Unicode is needed.
259
260If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
261Python can interface directly to this type using the following functions.
262Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
263the system's :ctype:`wchar_t`.
264
Georg Brandlf6842722008-01-19 22:08:21 +0000265
Victor Stinner5f8aae02010-05-14 15:53:20 +0000266wchar_t Support
267"""""""""""""""
268
269wchar_t support for platforms which support it:
Georg Brandlf6842722008-01-19 22:08:21 +0000270
271.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
272
273 Create a Unicode object from the :ctype:`wchar_t` buffer *w* of the given size.
274 Return *NULL* on failure.
275
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000276 .. versionchanged:: 2.5
277 This function used an :ctype:`int` type for *size*. This might require
278 changes in your code for properly supporting 64-bit systems.
279
Georg Brandlf6842722008-01-19 22:08:21 +0000280
281.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *unicode, wchar_t *w, Py_ssize_t size)
282
283 Copy the Unicode object contents into the :ctype:`wchar_t` buffer *w*. At most
284 *size* :ctype:`wchar_t` characters are copied (excluding a possibly trailing
285 0-termination character). Return the number of :ctype:`wchar_t` characters
286 copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
287 string may or may not be 0-terminated. It is the responsibility of the caller
288 to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
289 required by the application.
290
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000291 .. versionchanged:: 2.5
292 This function returned an :ctype:`int` type and used an :ctype:`int`
293 type for *size*. This might require changes in your code for properly
294 supporting 64-bit systems.
295
Georg Brandlf6842722008-01-19 22:08:21 +0000296
297.. _builtincodecs:
298
299Built-in Codecs
300^^^^^^^^^^^^^^^
301
Georg Brandld7d4fd72009-07-26 14:37:28 +0000302Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandlf6842722008-01-19 22:08:21 +0000303these codecs are directly usable via the following functions.
304
305Many of the following APIs take two arguments encoding and errors. These
306parameters encoding and errors have the same semantics as the ones of the
Georg Brandld7d4fd72009-07-26 14:37:28 +0000307built-in :func:`unicode` Unicode object constructor.
Georg Brandlf6842722008-01-19 22:08:21 +0000308
309Setting encoding to *NULL* causes the default encoding to be used which is
310ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
311as the encoding for file names. This variable should be treated as read-only: On
312some systems, it will be a pointer to a static string, on others, it will change
313at run-time (such as when the application invokes setlocale).
314
315Error handling is set by errors which may also be set to *NULL* meaning to use
316the default handling defined for the codec. Default error handling for all
Georg Brandld7d4fd72009-07-26 14:37:28 +0000317built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandlf6842722008-01-19 22:08:21 +0000318
319The codecs all use a similar interface. Only deviation from the following
320generic ones are documented for simplicity.
321
Georg Brandlf6842722008-01-19 22:08:21 +0000322
Victor Stinner5f8aae02010-05-14 15:53:20 +0000323Generic Codecs
324""""""""""""""
325
326These are the generic codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000327
328
329.. cfunction:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
330
331 Create a Unicode object by decoding *size* bytes of the encoded string *s*.
332 *encoding* and *errors* have the same meaning as the parameters of the same name
Georg Brandld7d4fd72009-07-26 14:37:28 +0000333 in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandlf6842722008-01-19 22:08:21 +0000334 using the Python codec registry. Return *NULL* if an exception was raised by
335 the codec.
336
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000337 .. versionchanged:: 2.5
338 This function used an :ctype:`int` type for *size*. This might require
339 changes in your code for properly supporting 64-bit systems.
340
Georg Brandlf6842722008-01-19 22:08:21 +0000341
342.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
343
344 Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
345 string object. *encoding* and *errors* have the same meaning as the parameters
346 of the same name in the Unicode :meth:`encode` method. The codec to be used is
347 looked up using the Python codec registry. Return *NULL* if an exception was
348 raised by the codec.
349
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000350 .. versionchanged:: 2.5
351 This function used an :ctype:`int` type for *size*. This might require
352 changes in your code for properly supporting 64-bit systems.
353
Georg Brandlf6842722008-01-19 22:08:21 +0000354
355.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
356
357 Encode a Unicode object and return the result as Python string object.
358 *encoding* and *errors* have the same meaning as the parameters of the same name
359 in the Unicode :meth:`encode` method. The codec to be used is looked up using
360 the Python codec registry. Return *NULL* if an exception was raised by the
361 codec.
362
Georg Brandlf6842722008-01-19 22:08:21 +0000363
Victor Stinner5f8aae02010-05-14 15:53:20 +0000364UTF-8 Codecs
365""""""""""""
366
367These are the UTF-8 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000368
369
370.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
371
372 Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
373 *s*. Return *NULL* if an exception was raised by the codec.
374
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000375 .. versionchanged:: 2.5
376 This function used an :ctype:`int` type for *size*. This might require
377 changes in your code for properly supporting 64-bit systems.
378
Georg Brandlf6842722008-01-19 22:08:21 +0000379
380.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
381
382 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
383 *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
384 treated as an error. Those bytes will not be decoded and the number of bytes
385 that have been decoded will be stored in *consumed*.
386
387 .. versionadded:: 2.4
388
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000389 .. versionchanged:: 2.5
390 This function used an :ctype:`int` type for *size*. This might require
391 changes in your code for properly supporting 64-bit systems.
392
Georg Brandlf6842722008-01-19 22:08:21 +0000393
394.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
395
396 Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and return a
397 Python string object. Return *NULL* if an exception was raised by the codec.
398
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000399 .. versionchanged:: 2.5
400 This function used an :ctype:`int` type for *size*. This might require
401 changes in your code for properly supporting 64-bit systems.
402
Georg Brandlf6842722008-01-19 22:08:21 +0000403
404.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
405
406 Encode a Unicode object using UTF-8 and return the result as Python string
407 object. Error handling is "strict". Return *NULL* if an exception was raised
408 by the codec.
409
Georg Brandlf6842722008-01-19 22:08:21 +0000410
Victor Stinner5f8aae02010-05-14 15:53:20 +0000411UTF-32 Codecs
412"""""""""""""
413
414These are the UTF-32 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000415
416
417.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
418
419 Decode *length* bytes from a UTF-32 encoded buffer string and return the
420 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
421 handling. It defaults to "strict".
422
423 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
424 order::
425
426 *byteorder == -1: little endian
427 *byteorder == 0: native order
428 *byteorder == 1: big endian
429
Georg Brandl579a3582009-09-18 21:35:59 +0000430 If ``*byteorder`` is zero, and the first four bytes of the input data are a
431 byte order mark (BOM), the decoder switches to this byte order and the BOM is
432 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
433 ``1``, any byte order mark is copied to the output.
434
435 After completion, *\*byteorder* is set to the current byte order at the end
436 of input data.
Georg Brandlf6842722008-01-19 22:08:21 +0000437
438 In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
439
440 If *byteorder* is *NULL*, the codec starts in native order mode.
441
442 Return *NULL* if an exception was raised by the codec.
443
444 .. versionadded:: 2.6
445
446
447.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
448
449 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
450 *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
451 trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
452 by four) as an error. Those bytes will not be decoded and the number of bytes
453 that have been decoded will be stored in *consumed*.
454
455 .. versionadded:: 2.6
456
457
458.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
459
460 Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl579a3582009-09-18 21:35:59 +0000461 data in *s*. Output is written according to the following byte order::
Georg Brandlf6842722008-01-19 22:08:21 +0000462
463 byteorder == -1: little endian
464 byteorder == 0: native byte order (writes a BOM mark)
465 byteorder == 1: big endian
466
467 If byteorder is ``0``, the output string will always start with the Unicode BOM
468 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
469
470 If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
471 as a single codepoint.
472
473 Return *NULL* if an exception was raised by the codec.
474
475 .. versionadded:: 2.6
476
477
478.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
479
480 Return a Python string using the UTF-32 encoding in native byte order. The
481 string always starts with a BOM mark. Error handling is "strict". Return
482 *NULL* if an exception was raised by the codec.
483
484 .. versionadded:: 2.6
485
486
Victor Stinner5f8aae02010-05-14 15:53:20 +0000487UTF-16 Codecs
488"""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000489
Victor Stinner5f8aae02010-05-14 15:53:20 +0000490These are the UTF-16 codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000491
492
493.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
494
495 Decode *length* bytes from a UTF-16 encoded buffer string and return the
496 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
497 handling. It defaults to "strict".
498
499 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
500 order::
501
502 *byteorder == -1: little endian
503 *byteorder == 0: native order
504 *byteorder == 1: big endian
505
Georg Brandl579a3582009-09-18 21:35:59 +0000506 If ``*byteorder`` is zero, and the first two bytes of the input data are a
507 byte order mark (BOM), the decoder switches to this byte order and the BOM is
508 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
509 ``1``, any byte order mark is copied to the output (where it will result in
510 either a ``\ufeff`` or a ``\ufffe`` character).
511
512 After completion, *\*byteorder* is set to the current byte order at the end
513 of input data.
Georg Brandlf6842722008-01-19 22:08:21 +0000514
515 If *byteorder* is *NULL*, the codec starts in native order mode.
516
517 Return *NULL* if an exception was raised by the codec.
518
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000519 .. versionchanged:: 2.5
520 This function used an :ctype:`int` type for *size*. This might require
521 changes in your code for properly supporting 64-bit systems.
522
Georg Brandlf6842722008-01-19 22:08:21 +0000523
524.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
525
526 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
527 *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
528 trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
529 split surrogate pair) as an error. Those bytes will not be decoded and the
530 number of bytes that have been decoded will be stored in *consumed*.
531
532 .. versionadded:: 2.4
533
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000534 .. versionchanged:: 2.5
535 This function used an :ctype:`int` type for *size* and an :ctype:`int *`
536 type for *consumed*. This might require changes in your code for
537 properly supporting 64-bit systems.
538
Georg Brandlf6842722008-01-19 22:08:21 +0000539
540.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
541
542 Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl579a3582009-09-18 21:35:59 +0000543 data in *s*. Output is written according to the following byte order::
Georg Brandlf6842722008-01-19 22:08:21 +0000544
545 byteorder == -1: little endian
546 byteorder == 0: native byte order (writes a BOM mark)
547 byteorder == 1: big endian
548
549 If byteorder is ``0``, the output string will always start with the Unicode BOM
550 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
551
552 If *Py_UNICODE_WIDE* is defined, a single :ctype:`Py_UNICODE` value may get
553 represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
554 values is interpreted as an UCS-2 character.
555
556 Return *NULL* if an exception was raised by the codec.
557
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000558 .. versionchanged:: 2.5
559 This function used an :ctype:`int` type for *size*. This might require
560 changes in your code for properly supporting 64-bit systems.
561
Georg Brandlf6842722008-01-19 22:08:21 +0000562
563.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
564
565 Return a Python string using the UTF-16 encoding in native byte order. The
566 string always starts with a BOM mark. Error handling is "strict". Return
567 *NULL* if an exception was raised by the codec.
568
Georg Brandlf6842722008-01-19 22:08:21 +0000569
Georg Brandl7d4bfb32010-08-02 21:44:25 +0000570UTF-7 Codecs
571""""""""""""
572
573These are the UTF-7 codec APIs:
574
575
576.. cfunction:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
577
578 Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
579 *s*. Return *NULL* if an exception was raised by the codec.
580
581
582.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
583
584 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF7`. If
585 *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
586 be treated as an error. Those bytes will not be decoded and the number of
587 bytes that have been decoded will be stored in *consumed*.
588
589
590.. cfunction:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
591
592 Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-7 and
593 return a Python bytes object. Return *NULL* if an exception was raised by
594 the codec.
595
596 If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
597 special meaning) will be encoded in base-64. If *base64WhiteSpace* is
598 nonzero, whitespace will be encoded in base-64. Both are set to zero for the
599 Python "utf-7" codec.
600
601
Victor Stinner5f8aae02010-05-14 15:53:20 +0000602Unicode-Escape Codecs
603"""""""""""""""""""""
604
605These are the "Unicode Escape" codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000606
607
608.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
609
610 Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
611 string *s*. Return *NULL* if an exception was raised by the codec.
612
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000613 .. versionchanged:: 2.5
614 This function used an :ctype:`int` type for *size*. This might require
615 changes in your code for properly supporting 64-bit systems.
616
Georg Brandlf6842722008-01-19 22:08:21 +0000617
618.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
619
620 Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
621 return a Python string object. Return *NULL* if an exception was raised by the
622 codec.
623
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000624 .. versionchanged:: 2.5
625 This function used an :ctype:`int` type for *size*. This might require
626 changes in your code for properly supporting 64-bit systems.
627
Georg Brandlf6842722008-01-19 22:08:21 +0000628
629.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
630
631 Encode a Unicode object using Unicode-Escape and return the result as Python
632 string object. Error handling is "strict". Return *NULL* if an exception was
633 raised by the codec.
634
Georg Brandlf6842722008-01-19 22:08:21 +0000635
Victor Stinner5f8aae02010-05-14 15:53:20 +0000636Raw-Unicode-Escape Codecs
637"""""""""""""""""""""""""
638
639These are the "Raw Unicode Escape" codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000640
641
642.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
643
644 Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
645 encoded string *s*. Return *NULL* if an exception was raised by the codec.
646
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000647 .. versionchanged:: 2.5
648 This function used an :ctype:`int` type for *size*. This might require
649 changes in your code for properly supporting 64-bit systems.
650
Georg Brandlf6842722008-01-19 22:08:21 +0000651
652.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
653
654 Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
655 and return a Python string object. Return *NULL* if an exception was raised by
656 the codec.
657
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000658 .. versionchanged:: 2.5
659 This function used an :ctype:`int` type for *size*. This might require
660 changes in your code for properly supporting 64-bit systems.
661
Georg Brandlf6842722008-01-19 22:08:21 +0000662
663.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
664
665 Encode a Unicode object using Raw-Unicode-Escape and return the result as
666 Python string object. Error handling is "strict". Return *NULL* if an exception
667 was raised by the codec.
668
Victor Stinner5f8aae02010-05-14 15:53:20 +0000669
670Latin-1 Codecs
671""""""""""""""
672
Georg Brandlf6842722008-01-19 22:08:21 +0000673These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
674ordinals and only these are accepted by the codecs during encoding.
675
Georg Brandlf6842722008-01-19 22:08:21 +0000676
677.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
678
679 Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
680 *s*. Return *NULL* if an exception was raised by the codec.
681
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000682 .. versionchanged:: 2.5
683 This function used an :ctype:`int` type for *size*. This might require
684 changes in your code for properly supporting 64-bit systems.
685
Georg Brandlf6842722008-01-19 22:08:21 +0000686
687.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
688
689 Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
690 a Python string object. Return *NULL* if an exception was raised by the codec.
691
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000692 .. versionchanged:: 2.5
693 This function used an :ctype:`int` type for *size*. This might require
694 changes in your code for properly supporting 64-bit systems.
695
Georg Brandlf6842722008-01-19 22:08:21 +0000696
697.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
698
699 Encode a Unicode object using Latin-1 and return the result as Python string
700 object. Error handling is "strict". Return *NULL* if an exception was raised
701 by the codec.
702
Victor Stinner5f8aae02010-05-14 15:53:20 +0000703
704ASCII Codecs
705""""""""""""
706
Georg Brandlf6842722008-01-19 22:08:21 +0000707These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
708codes generate errors.
709
Georg Brandlf6842722008-01-19 22:08:21 +0000710
711.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
712
713 Create a Unicode object by decoding *size* bytes of the ASCII encoded string
714 *s*. Return *NULL* if an exception was raised by the codec.
715
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000716 .. versionchanged:: 2.5
717 This function used an :ctype:`int` type for *size*. This might require
718 changes in your code for properly supporting 64-bit systems.
719
Georg Brandlf6842722008-01-19 22:08:21 +0000720
721.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
722
723 Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
724 Python string object. Return *NULL* if an exception was raised by the codec.
725
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000726 .. versionchanged:: 2.5
727 This function used an :ctype:`int` type for *size*. This might require
728 changes in your code for properly supporting 64-bit systems.
729
Georg Brandlf6842722008-01-19 22:08:21 +0000730
731.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
732
733 Encode a Unicode object using ASCII and return the result as Python string
734 object. Error handling is "strict". Return *NULL* if an exception was raised
735 by the codec.
736
Georg Brandlf6842722008-01-19 22:08:21 +0000737
Victor Stinner5f8aae02010-05-14 15:53:20 +0000738Character Map Codecs
739""""""""""""""""""""
740
741These are the mapping codec APIs:
Georg Brandlf6842722008-01-19 22:08:21 +0000742
743This codec is special in that it can be used to implement many different codecs
744(and this is in fact what was done to obtain most of the standard codecs
745included in the :mod:`encodings` package). The codec uses mapping to encode and
746decode characters.
747
748Decoding mappings must map single string characters to single Unicode
749characters, integers (which are then interpreted as Unicode ordinals) or None
750(meaning "undefined mapping" and causing an error).
751
752Encoding mappings must map single Unicode characters to single string
753characters, integers (which are then interpreted as Latin-1 ordinals) or None
754(meaning "undefined mapping" and causing an error).
755
756The mapping objects provided must only support the __getitem__ mapping
757interface.
758
759If a character lookup fails with a LookupError, the character is copied as-is
760meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
761resp. Because of this, mappings only need to contain those mappings which map
762characters to different code points.
763
764
765.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
766
767 Create a Unicode object by decoding *size* bytes of the encoded string *s* using
768 the given *mapping* object. Return *NULL* if an exception was raised by the
769 codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
770 dictionary mapping byte or a unicode string, which is treated as a lookup table.
771 Byte values greater that the length of the string and U+FFFE "characters" are
772 treated as "undefined mapping".
773
774 .. versionchanged:: 2.4
775 Allowed unicode string as mapping argument.
776
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000777 .. versionchanged:: 2.5
778 This function used an :ctype:`int` type for *size*. This might require
779 changes in your code for properly supporting 64-bit systems.
780
Georg Brandlf6842722008-01-19 22:08:21 +0000781
782.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
783
784 Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
785 *mapping* object and return a Python string object. Return *NULL* if an
786 exception was raised by the codec.
787
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000788 .. versionchanged:: 2.5
789 This function used an :ctype:`int` type for *size*. This might require
790 changes in your code for properly supporting 64-bit systems.
791
Georg Brandlf6842722008-01-19 22:08:21 +0000792
793.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
794
795 Encode a Unicode object using the given *mapping* object and return the result
796 as Python string object. Error handling is "strict". Return *NULL* if an
797 exception was raised by the codec.
798
799The following codec API is special in that maps Unicode to Unicode.
800
801
802.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
803
804 Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
805 character mapping *table* to it and return the resulting Unicode object. Return
806 *NULL* when an exception was raised by the codec.
807
808 The *mapping* table must map Unicode ordinal integers to Unicode ordinal
809 integers or None (causing deletion of the character).
810
811 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
812 and sequences work well. Unmapped character ordinals (ones which cause a
813 :exc:`LookupError`) are left untouched and are copied as-is.
814
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000815 .. versionchanged:: 2.5
816 This function used an :ctype:`int` type for *size*. This might require
817 changes in your code for properly supporting 64-bit systems.
818
Georg Brandlf6842722008-01-19 22:08:21 +0000819These are the MBCS codec APIs. They are currently only available on Windows and
820use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
821DBCS) is a class of encodings, not just one. The target encoding is defined by
822the user settings on the machine running the codec.
823
Victor Stinner5f8aae02010-05-14 15:53:20 +0000824
825MBCS codecs for Windows
826"""""""""""""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000827
828
829.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
830
831 Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
832 Return *NULL* if an exception was raised by the codec.
833
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000834 .. versionchanged:: 2.5
835 This function used an :ctype:`int` type for *size*. This might require
836 changes in your code for properly supporting 64-bit systems.
837
Georg Brandlf6842722008-01-19 22:08:21 +0000838
839.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
840
841 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
842 *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
843 trailing lead byte and the number of bytes that have been decoded will be stored
844 in *consumed*.
845
846 .. versionadded:: 2.5
847
848
849.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
850
851 Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
852 Python string object. Return *NULL* if an exception was raised by the codec.
853
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000854 .. versionchanged:: 2.5
855 This function used an :ctype:`int` type for *size*. This might require
856 changes in your code for properly supporting 64-bit systems.
857
Georg Brandlf6842722008-01-19 22:08:21 +0000858
859.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
860
861 Encode a Unicode object using MBCS and return the result as Python string
862 object. Error handling is "strict". Return *NULL* if an exception was raised
863 by the codec.
864
Georg Brandlf6842722008-01-19 22:08:21 +0000865
Victor Stinner5f8aae02010-05-14 15:53:20 +0000866Methods & Slots
867"""""""""""""""
Georg Brandlf6842722008-01-19 22:08:21 +0000868
869.. _unicodemethodsandslots:
870
871Methods and Slot Functions
872^^^^^^^^^^^^^^^^^^^^^^^^^^
873
874The following APIs are capable of handling Unicode objects and strings on input
875(we refer to them as strings in the descriptions) and return Unicode objects or
876integers as appropriate.
877
878They all return *NULL* or ``-1`` if an exception occurs.
879
880
881.. cfunction:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
882
883 Concat two strings giving a new Unicode string.
884
885
886.. cfunction:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
887
888 Split a string giving a list of Unicode strings. If sep is *NULL*, splitting
889 will be done at all whitespace substrings. Otherwise, splits occur at the given
890 separator. At most *maxsplit* splits will be done. If negative, no limit is
891 set. Separators are not included in the resulting list.
892
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000893 .. versionchanged:: 2.5
894 This function used an :ctype:`int` type for *maxsplit*. This might require
895 changes in your code for properly supporting 64-bit systems.
896
Georg Brandlf6842722008-01-19 22:08:21 +0000897
898.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
899
900 Split a Unicode string at line breaks, returning a list of Unicode strings.
901 CRLF is considered to be one line break. If *keepend* is 0, the Line break
902 characters are not included in the resulting strings.
903
904
905.. cfunction:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
906
907 Translate a string by applying a character mapping table to it and return the
908 resulting Unicode object.
909
910 The mapping table must map Unicode ordinal integers to Unicode ordinal integers
911 or None (causing deletion of the character).
912
913 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
914 and sequences work well. Unmapped character ordinals (ones which cause a
915 :exc:`LookupError`) are left untouched and are copied as-is.
916
917 *errors* has the usual meaning for codecs. It may be *NULL* which indicates to
918 use the default error handling.
919
920
921.. cfunction:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
922
923 Join a sequence of strings using the given separator and return the resulting
924 Unicode string.
925
926
927.. cfunction:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
928
929 Return 1 if *substr* matches *str*[*start*:*end*] at the given tail end
930 (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
931 0 otherwise. Return ``-1`` if an error occurred.
932
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000933 .. versionchanged:: 2.5
934 This function used an :ctype:`int` type for *start* and *end*. This
935 might require changes in your code for properly supporting 64-bit
936 systems.
937
Georg Brandlf6842722008-01-19 22:08:21 +0000938
939.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
940
941 Return the first position of *substr* in *str*[*start*:*end*] using the given
942 *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
943 backward search). The return value is the index of the first match; a value of
944 ``-1`` indicates that no match was found, and ``-2`` indicates that an error
945 occurred and an exception has been set.
946
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000947 .. versionchanged:: 2.5
948 This function used an :ctype:`int` type for *start* and *end*. This
949 might require changes in your code for properly supporting 64-bit
950 systems.
951
Georg Brandlf6842722008-01-19 22:08:21 +0000952
953.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
954
955 Return the number of non-overlapping occurrences of *substr* in
956 ``str[start:end]``. Return ``-1`` if an error occurred.
957
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000958 .. versionchanged:: 2.5
959 This function returned an :ctype:`int` type and used an :ctype:`int`
960 type for *start* and *end*. This might require changes in your code for
961 properly supporting 64-bit systems.
962
Georg Brandlf6842722008-01-19 22:08:21 +0000963
964.. cfunction:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
965
966 Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
967 return the resulting Unicode object. *maxcount* == -1 means replace all
968 occurrences.
969
Jeroen Ruigrok van der Wervendfcffd42009-04-25 21:16:05 +0000970 .. versionchanged:: 2.5
971 This function used an :ctype:`int` type for *maxcount*. This might
972 require changes in your code for properly supporting 64-bit systems.
973
Georg Brandlf6842722008-01-19 22:08:21 +0000974
975.. cfunction:: int PyUnicode_Compare(PyObject *left, PyObject *right)
976
977 Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
978 respectively.
979
980
981.. cfunction:: int PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
982
983 Rich compare two unicode strings and return one of the following:
984
985 * ``NULL`` in case an exception was raised
986 * :const:`Py_True` or :const:`Py_False` for successful comparisons
987 * :const:`Py_NotImplemented` in case the type combination is unknown
988
989 Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
990 :exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
991 with a :exc:`UnicodeDecodeError`.
992
993 Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
994 :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
995
996
997.. cfunction:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
998
999 Return a new string object from *format* and *args*; this is analogous to
1000 ``format % args``. The *args* argument must be a tuple.
1001
1002
1003.. cfunction:: int PyUnicode_Contains(PyObject *container, PyObject *element)
1004
1005 Check whether *element* is contained in *container* and return true or false
1006 accordingly.
1007
1008 *element* has to coerce to a one element Unicode string. ``-1`` is returned if
1009 there was an error.