blob: 4db76712d4cf38f62ef7fd857d9f6957a18f0bf8 [file] [log] [blame]
Georg Brandl54a3faa2008-01-20 09:30:57 +00001.. highlightlang:: c
2
3.. _unicodeobjects:
4
5Unicode Objects and Codecs
6--------------------------
7
8.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
9
10Unicode Objects
11^^^^^^^^^^^^^^^
12
Victor Stinner9076f9e2010-05-14 16:08:46 +000013Unicode Type
14""""""""""""
15
Georg Brandl54a3faa2008-01-20 09:30:57 +000016These are the basic Unicode object types used for the Unicode implementation in
17Python:
18
Georg Brandl54a3faa2008-01-20 09:30:57 +000019
20.. ctype:: Py_UNICODE
21
22 This type represents the storage type which is used by Python internally as
23 basis for holding Unicode ordinals. Python's default builds use a 16-bit type
24 for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
25 possible to build a UCS4 version of Python (most recent Linux distributions come
26 with UCS4 builds of Python). These builds then use a 32-bit type for
27 :ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
28 where :ctype:`wchar_t` is available and compatible with the chosen Python
29 Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
30 :ctype:`wchar_t` to enhance native platform compatibility. On all other
31 platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
32 short` (UCS2) or :ctype:`unsigned long` (UCS4).
33
34Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
35this in mind when writing extensions or interfaces.
36
37
38.. ctype:: PyUnicodeObject
39
40 This subtype of :ctype:`PyObject` represents a Python Unicode object.
41
42
43.. cvar:: PyTypeObject PyUnicode_Type
44
45 This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
46 is exposed to Python code as ``str``.
47
48The following APIs are really C macros and can be used to do fast checks and to
49access internal read-only data of Unicode objects:
50
51
52.. cfunction:: int PyUnicode_Check(PyObject *o)
53
54 Return true if the object *o* is a Unicode object or an instance of a Unicode
55 subtype.
56
57
58.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
59
60 Return true if the object *o* is a Unicode object, but not an instance of a
61 subtype.
62
63
64.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
65
66 Return the size of the object. *o* has to be a :ctype:`PyUnicodeObject` (not
67 checked).
68
69
70.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
71
72 Return the size of the object's internal buffer in bytes. *o* has to be a
73 :ctype:`PyUnicodeObject` (not checked).
74
75
76.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
77
78 Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. *o*
79 has to be a :ctype:`PyUnicodeObject` (not checked).
80
81
82.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
83
84 Return a pointer to the internal buffer of the object. *o* has to be a
85 :ctype:`PyUnicodeObject` (not checked).
86
Christian Heimesa156e092008-02-16 07:38:31 +000087
Georg Brandlc5605df2009-08-13 08:26:44 +000088.. cfunction:: int PyUnicode_ClearFreeList()
Christian Heimesa156e092008-02-16 07:38:31 +000089
90 Clear the free list. Return the total number of freed items.
91
Georg Brandlc5605df2009-08-13 08:26:44 +000092
Victor Stinner9076f9e2010-05-14 16:08:46 +000093Unicode Character Properties
94""""""""""""""""""""""""""""
95
Georg Brandl54a3faa2008-01-20 09:30:57 +000096Unicode provides many different character properties. The most often needed ones
97are available through these macros which are mapped to C functions depending on
98the Python configuration.
99
Georg Brandl54a3faa2008-01-20 09:30:57 +0000100
101.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
102
103 Return 1 or 0 depending on whether *ch* is a whitespace character.
104
105
106.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
107
108 Return 1 or 0 depending on whether *ch* is a lowercase character.
109
110
111.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
112
113 Return 1 or 0 depending on whether *ch* is an uppercase character.
114
115
116.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
117
118 Return 1 or 0 depending on whether *ch* is a titlecase character.
119
120
121.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
122
123 Return 1 or 0 depending on whether *ch* is a linebreak character.
124
125
126.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
127
128 Return 1 or 0 depending on whether *ch* is a decimal character.
129
130
131.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
132
133 Return 1 or 0 depending on whether *ch* is a digit character.
134
135
136.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
137
138 Return 1 or 0 depending on whether *ch* is a numeric character.
139
140
141.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
142
143 Return 1 or 0 depending on whether *ch* is an alphabetic character.
144
145
146.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
147
148 Return 1 or 0 depending on whether *ch* is an alphanumeric character.
149
Georg Brandl559e5d72008-06-11 18:37:52 +0000150
151.. cfunction:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
152
153 Return 1 or 0 depending on whether *ch* is a printable character.
154 Nonprintable characters are those characters defined in the Unicode character
155 database as "Other" or "Separator", excepting the ASCII space (0x20) which is
156 considered printable. (Note that printable characters in this context are
157 those which should not be escaped when :func:`repr` is invoked on a string.
158 It has no bearing on the handling of strings written to :data:`sys.stdout` or
159 :data:`sys.stderr`.)
160
161
Georg Brandl54a3faa2008-01-20 09:30:57 +0000162These APIs can be used for fast direct character conversions:
163
164
165.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
166
167 Return the character *ch* converted to lower case.
168
169
170.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
171
172 Return the character *ch* converted to upper case.
173
174
175.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
176
177 Return the character *ch* converted to title case.
178
179
180.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
181
182 Return the character *ch* converted to a decimal positive integer. Return
183 ``-1`` if this is not possible. This macro does not raise exceptions.
184
185
186.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
187
188 Return the character *ch* converted to a single digit integer. Return ``-1`` if
189 this is not possible. This macro does not raise exceptions.
190
191
192.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
193
194 Return the character *ch* converted to a double. Return ``-1.0`` if this is not
195 possible. This macro does not raise exceptions.
196
Victor Stinner9076f9e2010-05-14 16:08:46 +0000197
198Plain Py_UNICODE
199""""""""""""""""
200
Georg Brandl54a3faa2008-01-20 09:30:57 +0000201To create Unicode objects and access their basic sequence properties, use these
202APIs:
203
Georg Brandl54a3faa2008-01-20 09:30:57 +0000204
205.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
206
207 Create a Unicode Object from the Py_UNICODE buffer *u* of the given size. *u*
208 may be *NULL* which causes the contents to be undefined. It is the user's
209 responsibility to fill in the needed data. The buffer is copied into the new
210 object. If the buffer is not *NULL*, the return value might be a shared object.
211 Therefore, modification of the resulting Unicode object is only allowed when *u*
212 is *NULL*.
213
214
215.. cfunction:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
216
217 Create a Unicode Object from the char buffer *u*. The bytes will be interpreted
218 as being UTF-8 encoded. *u* may also be *NULL* which
219 causes the contents to be undefined. It is the user's responsibility to fill in
220 the needed data. The buffer is copied into the new object. If the buffer is not
221 *NULL*, the return value might be a shared object. Therefore, modification of
222 the resulting Unicode object is only allowed when *u* is *NULL*.
223
224
225.. cfunction:: PyObject *PyUnicode_FromString(const char *u)
226
227 Create a Unicode object from an UTF-8 encoded null-terminated char buffer
228 *u*.
229
230
231.. cfunction:: PyObject* PyUnicode_FromFormat(const char *format, ...)
232
233 Take a C :cfunc:`printf`\ -style *format* string and a variable number of
234 arguments, calculate the size of the resulting Python unicode string and return
235 a string with the values formatted into it. The variable arguments must be C
236 types and must correspond exactly to the format characters in the *format*
237 string. The following format characters are allowed:
238
239 .. % The descriptions for %zd and %zu are wrong, but the truth is complicated
240 .. % because not all compilers support the %z width modifier -- we fake it
241 .. % when necessary via interpolating PY_FORMAT_SIZE_T.
242
243 +-------------------+---------------------+--------------------------------+
244 | Format Characters | Type | Comment |
245 +===================+=====================+================================+
246 | :attr:`%%` | *n/a* | The literal % character. |
247 +-------------------+---------------------+--------------------------------+
248 | :attr:`%c` | int | A single character, |
249 | | | represented as an C int. |
250 +-------------------+---------------------+--------------------------------+
251 | :attr:`%d` | int | Exactly equivalent to |
252 | | | ``printf("%d")``. |
253 +-------------------+---------------------+--------------------------------+
254 | :attr:`%u` | unsigned int | Exactly equivalent to |
255 | | | ``printf("%u")``. |
256 +-------------------+---------------------+--------------------------------+
257 | :attr:`%ld` | long | Exactly equivalent to |
258 | | | ``printf("%ld")``. |
259 +-------------------+---------------------+--------------------------------+
260 | :attr:`%lu` | unsigned long | Exactly equivalent to |
261 | | | ``printf("%lu")``. |
262 +-------------------+---------------------+--------------------------------+
263 | :attr:`%zd` | Py_ssize_t | Exactly equivalent to |
264 | | | ``printf("%zd")``. |
265 +-------------------+---------------------+--------------------------------+
266 | :attr:`%zu` | size_t | Exactly equivalent to |
267 | | | ``printf("%zu")``. |
268 +-------------------+---------------------+--------------------------------+
269 | :attr:`%i` | int | Exactly equivalent to |
270 | | | ``printf("%i")``. |
271 +-------------------+---------------------+--------------------------------+
272 | :attr:`%x` | int | Exactly equivalent to |
273 | | | ``printf("%x")``. |
274 +-------------------+---------------------+--------------------------------+
275 | :attr:`%s` | char\* | A null-terminated C character |
276 | | | array. |
277 +-------------------+---------------------+--------------------------------+
278 | :attr:`%p` | void\* | The hex representation of a C |
279 | | | pointer. Mostly equivalent to |
280 | | | ``printf("%p")`` except that |
281 | | | it is guaranteed to start with |
282 | | | the literal ``0x`` regardless |
283 | | | of what the platform's |
284 | | | ``printf`` yields. |
285 +-------------------+---------------------+--------------------------------+
Georg Brandl559e5d72008-06-11 18:37:52 +0000286 | :attr:`%A` | PyObject\* | The result of calling |
287 | | | :func:`ascii`. |
288 +-------------------+---------------------+--------------------------------+
Georg Brandl54a3faa2008-01-20 09:30:57 +0000289 | :attr:`%U` | PyObject\* | A unicode object. |
290 +-------------------+---------------------+--------------------------------+
291 | :attr:`%V` | PyObject\*, char \* | A unicode object (which may be |
292 | | | *NULL*) and a null-terminated |
293 | | | C character array as a second |
294 | | | parameter (which will be used, |
295 | | | if the first parameter is |
296 | | | *NULL*). |
297 +-------------------+---------------------+--------------------------------+
298 | :attr:`%S` | PyObject\* | The result of calling |
Benjamin Petersone8662062009-03-08 23:51:13 +0000299 | | | :func:`PyObject_Str`. |
Georg Brandl54a3faa2008-01-20 09:30:57 +0000300 +-------------------+---------------------+--------------------------------+
301 | :attr:`%R` | PyObject\* | The result of calling |
302 | | | :func:`PyObject_Repr`. |
303 +-------------------+---------------------+--------------------------------+
304
305 An unrecognized format character causes all the rest of the format string to be
306 copied as-is to the result string, and any extra arguments discarded.
307
308
309.. cfunction:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
310
311 Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
312 arguments.
313
314
315.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
316
317 Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
318 buffer, *NULL* if *unicode* is not a Unicode object.
319
320
321.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
322
323 Return the length of the Unicode object.
324
325
326.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors)
327
328 Coerce an encoded object *obj* to an Unicode object and return a reference with
329 incremented refcount.
330
Georg Brandlc7b69082010-10-06 08:08:40 +0000331 :class:`bytes`, :class:`bytearray` and other char buffer compatible objects
332 are decoded according to the given encoding and using the error handling
333 defined by errors. Both can be *NULL* to have the interface use the default
334 values (see the next section for details).
Georg Brandl54a3faa2008-01-20 09:30:57 +0000335
336 All other objects, including Unicode objects, cause a :exc:`TypeError` to be
337 set.
338
339 The API returns *NULL* if there was an error. The caller is responsible for
340 decref'ing the returned objects.
341
342
343.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
344
345 Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
346 throughout the interpreter whenever coercion to Unicode is needed.
347
348If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
349Python can interface directly to this type using the following functions.
350Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
351the system's :ctype:`wchar_t`.
352
Georg Brandl54a3faa2008-01-20 09:30:57 +0000353
Victor Stinner9076f9e2010-05-14 16:08:46 +0000354File System Encoding
355""""""""""""""""""""
356
357To encode and decode file names and other environment strings,
358:cdata:`Py_FileSystemEncoding` should be used as the encoding, and
359``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
360encode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandlec18c392010-08-01 21:23:10 +0000361used, passsing :func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner9076f9e2010-05-14 16:08:46 +0000362
363.. cfunction:: int PyUnicode_FSConverter(PyObject* obj, void* result)
364
365 Convert *obj* into *result*, using :cdata:`Py_FileSystemDefaultEncoding`,
366 and the ``"surrogateescape"`` error handler. *result* must be a
367 ``PyObject*``, return a :func:`bytes` object which must be released if it
368 is no longer used.
369
370 .. versionadded:: 3.1
371
372.. cfunction:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
373
374 Decode a null-terminated string using :cdata:`Py_FileSystemDefaultEncoding`
375 and the ``"surrogateescape"`` error handler.
376
377 If :cdata:`Py_FileSystemDefaultEncoding` is not set, fall back to UTF-8.
378
379 Use :func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
380
381.. cfunction:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
382
383 Decode a string using :cdata:`Py_FileSystemDefaultEncoding` and
384 the ``"surrogateescape"`` error handler.
385
386 If :cdata:`Py_FileSystemDefaultEncoding` is not set, fall back to UTF-8.
387
388
389wchar_t Support
390"""""""""""""""
391
392wchar_t support for platforms which support it:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000393
394.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
395
396 Create a Unicode object from the :ctype:`wchar_t` buffer *w* of the given size.
Martin v. Löwis790465f2008-04-05 20:41:37 +0000397 Passing -1 as the size indicates that the function must itself compute the length,
398 using wcslen.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000399 Return *NULL* on failure.
400
401
402.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject *unicode, wchar_t *w, Py_ssize_t size)
403
404 Copy the Unicode object contents into the :ctype:`wchar_t` buffer *w*. At most
405 *size* :ctype:`wchar_t` characters are copied (excluding a possibly trailing
406 0-termination character). Return the number of :ctype:`wchar_t` characters
407 copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
408 string may or may not be 0-terminated. It is the responsibility of the caller
409 to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
410 required by the application.
411
412
413.. _builtincodecs:
414
415Built-in Codecs
416^^^^^^^^^^^^^^^
417
Georg Brandlc5605df2009-08-13 08:26:44 +0000418Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl54a3faa2008-01-20 09:30:57 +0000419these codecs are directly usable via the following functions.
420
421Many of the following APIs take two arguments encoding and errors. These
422parameters encoding and errors have the same semantics as the ones of the
Daniel Stutzbach23ef20f2010-09-03 18:37:34 +0000423built-in :func:`str` string object constructor.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000424
Martin v. Löwisc15bdef2009-05-29 14:47:46 +0000425Setting encoding to *NULL* causes the default encoding to be used
426which is ASCII. The file system calls should use
427:cfunc:`PyUnicode_FSConverter` for encoding file names. This uses the
428variable :cdata:`Py_FileSystemDefaultEncoding` internally. This
429variable should be treated as read-only: On some systems, it will be a
430pointer to a static string, on others, it will change at run-time
431(such as when the application invokes setlocale).
Georg Brandl54a3faa2008-01-20 09:30:57 +0000432
433Error handling is set by errors which may also be set to *NULL* meaning to use
434the default handling defined for the codec. Default error handling for all
Georg Brandlc5605df2009-08-13 08:26:44 +0000435built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl54a3faa2008-01-20 09:30:57 +0000436
437The codecs all use a similar interface. Only deviation from the following
438generic ones are documented for simplicity.
439
Georg Brandl54a3faa2008-01-20 09:30:57 +0000440
Victor Stinner9076f9e2010-05-14 16:08:46 +0000441Generic Codecs
442""""""""""""""
443
444These are the generic codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000445
446
447.. cfunction:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors)
448
449 Create a Unicode object by decoding *size* bytes of the encoded string *s*.
450 *encoding* and *errors* have the same meaning as the parameters of the same name
Georg Brandlc5605df2009-08-13 08:26:44 +0000451 in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandl54a3faa2008-01-20 09:30:57 +0000452 using the Python codec registry. Return *NULL* if an exception was raised by
453 the codec.
454
455
456.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors)
457
458 Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000459 bytes object. *encoding* and *errors* have the same meaning as the
460 parameters of the same name in the Unicode :meth:`encode` method. The codec
461 to be used is looked up using the Python codec registry. Return *NULL* if an
462 exception was raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000463
464
465.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors)
466
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000467 Encode a Unicode object and return the result as Python bytes object.
468 *encoding* and *errors* have the same meaning as the parameters of the same
469 name in the Unicode :meth:`encode` method. The codec to be used is looked up
470 using the Python codec registry. Return *NULL* if an exception was raised by
471 the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000472
Georg Brandl54a3faa2008-01-20 09:30:57 +0000473
Victor Stinner9076f9e2010-05-14 16:08:46 +0000474UTF-8 Codecs
475""""""""""""
476
477These are the UTF-8 codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000478
479
480.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors)
481
482 Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string
483 *s*. Return *NULL* if an exception was raised by the codec.
484
485
486.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
487
488 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
489 *consumed* is not *NULL*, trailing incomplete UTF-8 byte sequences will not be
490 treated as an error. Those bytes will not be decoded and the number of bytes
491 that have been decoded will be stored in *consumed*.
492
493
494.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
495
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000496 Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and
497 return a Python bytes object. Return *NULL* if an exception was raised by
498 the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000499
500
501.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
502
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000503 Encode a Unicode object using UTF-8 and return the result as Python bytes
504 object. Error handling is "strict". Return *NULL* if an exception was
505 raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000506
Georg Brandl54a3faa2008-01-20 09:30:57 +0000507
Victor Stinner9076f9e2010-05-14 16:08:46 +0000508UTF-32 Codecs
509"""""""""""""
510
511These are the UTF-32 codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000512
513
514.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
515
516 Decode *length* bytes from a UTF-32 encoded buffer string and return the
517 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
518 handling. It defaults to "strict".
519
520 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
521 order::
522
523 *byteorder == -1: little endian
524 *byteorder == 0: native order
525 *byteorder == 1: big endian
526
Benjamin Petersonf3d7dbe2009-10-04 14:54:52 +0000527 If ``*byteorder`` is zero, and the first four bytes of the input data are a
528 byte order mark (BOM), the decoder switches to this byte order and the BOM is
529 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
530 ``1``, any byte order mark is copied to the output.
531
532 After completion, *\*byteorder* is set to the current byte order at the end
533 of input data.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000534
535 In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
536
537 If *byteorder* is *NULL*, the codec starts in native order mode.
538
539 Return *NULL* if an exception was raised by the codec.
540
541
542.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
543
544 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
545 *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
546 trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
547 by four) as an error. Those bytes will not be decoded and the number of bytes
548 that have been decoded will be stored in *consumed*.
549
550
551.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
552
553 Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Petersonf3d7dbe2009-10-04 14:54:52 +0000554 data in *s*. Output is written according to the following byte order::
Georg Brandl54a3faa2008-01-20 09:30:57 +0000555
556 byteorder == -1: little endian
557 byteorder == 0: native byte order (writes a BOM mark)
558 byteorder == 1: big endian
559
560 If byteorder is ``0``, the output string will always start with the Unicode BOM
561 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
562
563 If *Py_UNICODE_WIDE* is not defined, surrogate pairs will be output
564 as a single codepoint.
565
566 Return *NULL* if an exception was raised by the codec.
567
568
569.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
570
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000571 Return a Python byte string using the UTF-32 encoding in native byte
572 order. The string always starts with a BOM mark. Error handling is "strict".
573 Return *NULL* if an exception was raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000574
575
Victor Stinner9076f9e2010-05-14 16:08:46 +0000576UTF-16 Codecs
577"""""""""""""
Georg Brandl54a3faa2008-01-20 09:30:57 +0000578
Victor Stinner9076f9e2010-05-14 16:08:46 +0000579These are the UTF-16 codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000580
581
582.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder)
583
584 Decode *length* bytes from a UTF-16 encoded buffer string and return the
585 corresponding Unicode object. *errors* (if non-*NULL*) defines the error
586 handling. It defaults to "strict".
587
588 If *byteorder* is non-*NULL*, the decoder starts decoding using the given byte
589 order::
590
591 *byteorder == -1: little endian
592 *byteorder == 0: native order
593 *byteorder == 1: big endian
594
Benjamin Petersonf3d7dbe2009-10-04 14:54:52 +0000595 If ``*byteorder`` is zero, and the first two bytes of the input data are a
596 byte order mark (BOM), the decoder switches to this byte order and the BOM is
597 not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
598 ``1``, any byte order mark is copied to the output (where it will result in
599 either a ``\ufeff`` or a ``\ufffe`` character).
600
601 After completion, *\*byteorder* is set to the current byte order at the end
602 of input data.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000603
604 If *byteorder* is *NULL*, the codec starts in native order mode.
605
606 Return *NULL* if an exception was raised by the codec.
607
608
609.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed)
610
611 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
612 *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
613 trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
614 split surrogate pair) as an error. Those bytes will not be decoded and the
615 number of bytes that have been decoded will be stored in *consumed*.
616
617
618.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder)
619
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000620 Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Petersonf3d7dbe2009-10-04 14:54:52 +0000621 data in *s*. Output is written according to the following byte order::
Georg Brandl54a3faa2008-01-20 09:30:57 +0000622
623 byteorder == -1: little endian
624 byteorder == 0: native byte order (writes a BOM mark)
625 byteorder == 1: big endian
626
627 If byteorder is ``0``, the output string will always start with the Unicode BOM
628 mark (U+FEFF). In the other two modes, no BOM mark is prepended.
629
630 If *Py_UNICODE_WIDE* is defined, a single :ctype:`Py_UNICODE` value may get
631 represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
632 values is interpreted as an UCS-2 character.
633
634 Return *NULL* if an exception was raised by the codec.
635
636
637.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
638
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000639 Return a Python byte string using the UTF-16 encoding in native byte
640 order. The string always starts with a BOM mark. Error handling is "strict".
641 Return *NULL* if an exception was raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000642
Georg Brandl54a3faa2008-01-20 09:30:57 +0000643
Georg Brandl4009c9e2010-10-06 08:26:09 +0000644UTF-7 Codecs
645""""""""""""
646
647These are the UTF-7 codec APIs:
648
649
650.. cfunction:: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors)
651
652 Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string
653 *s*. Return *NULL* if an exception was raised by the codec.
654
655
656.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed)
657
658 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeUTF7`. If
659 *consumed* is not *NULL*, trailing incomplete UTF-7 base-64 sections will not
660 be treated as an error. Those bytes will not be decoded and the number of
661 bytes that have been decoded will be stored in *consumed*.
662
663
664.. cfunction:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors)
665
666 Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-7 and
667 return a Python bytes object. Return *NULL* if an exception was raised by
668 the codec.
669
670 If *base64SetO* is nonzero, "Set O" (punctuation that has no otherwise
671 special meaning) will be encoded in base-64. If *base64WhiteSpace* is
672 nonzero, whitespace will be encoded in base-64. Both are set to zero for the
673 Python "utf-7" codec.
674
675
Victor Stinner9076f9e2010-05-14 16:08:46 +0000676Unicode-Escape Codecs
677"""""""""""""""""""""
678
679These are the "Unicode Escape" codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000680
681
682.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
683
684 Create a Unicode object by decoding *size* bytes of the Unicode-Escape encoded
685 string *s*. Return *NULL* if an exception was raised by the codec.
686
687
688.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
689
690 Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
691 return a Python string object. Return *NULL* if an exception was raised by the
692 codec.
693
694
695.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
696
697 Encode a Unicode object using Unicode-Escape and return the result as Python
698 string object. Error handling is "strict". Return *NULL* if an exception was
699 raised by the codec.
700
Georg Brandl54a3faa2008-01-20 09:30:57 +0000701
Victor Stinner9076f9e2010-05-14 16:08:46 +0000702Raw-Unicode-Escape Codecs
703"""""""""""""""""""""""""
704
705These are the "Raw Unicode Escape" codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000706
707
708.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors)
709
710 Create a Unicode object by decoding *size* bytes of the Raw-Unicode-Escape
711 encoded string *s*. Return *NULL* if an exception was raised by the codec.
712
713
714.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
715
716 Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
717 and return a Python string object. Return *NULL* if an exception was raised by
718 the codec.
719
720
721.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
722
723 Encode a Unicode object using Raw-Unicode-Escape and return the result as
724 Python string object. Error handling is "strict". Return *NULL* if an exception
725 was raised by the codec.
726
Victor Stinner9076f9e2010-05-14 16:08:46 +0000727
728Latin-1 Codecs
729""""""""""""""
730
Georg Brandl54a3faa2008-01-20 09:30:57 +0000731These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
732ordinals and only these are accepted by the codecs during encoding.
733
Georg Brandl54a3faa2008-01-20 09:30:57 +0000734
735.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors)
736
737 Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string
738 *s*. Return *NULL* if an exception was raised by the codec.
739
740
741.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
742
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000743 Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and
744 return a Python bytes object. Return *NULL* if an exception was raised by
745 the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000746
747
748.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
749
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000750 Encode a Unicode object using Latin-1 and return the result as Python bytes
751 object. Error handling is "strict". Return *NULL* if an exception was
752 raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000753
Victor Stinner9076f9e2010-05-14 16:08:46 +0000754
755ASCII Codecs
756""""""""""""
757
Georg Brandl54a3faa2008-01-20 09:30:57 +0000758These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
759codes generate errors.
760
Georg Brandl54a3faa2008-01-20 09:30:57 +0000761
762.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors)
763
764 Create a Unicode object by decoding *size* bytes of the ASCII encoded string
765 *s*. Return *NULL* if an exception was raised by the codec.
766
767
768.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
769
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000770 Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and
771 return a Python bytes object. Return *NULL* if an exception was raised by
772 the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000773
774
775.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
776
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000777 Encode a Unicode object using ASCII and return the result as Python bytes
778 object. Error handling is "strict". Return *NULL* if an exception was
779 raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000780
Georg Brandl54a3faa2008-01-20 09:30:57 +0000781
Victor Stinner9076f9e2010-05-14 16:08:46 +0000782Character Map Codecs
783""""""""""""""""""""
784
785These are the mapping codec APIs:
Georg Brandl54a3faa2008-01-20 09:30:57 +0000786
787This codec is special in that it can be used to implement many different codecs
788(and this is in fact what was done to obtain most of the standard codecs
789included in the :mod:`encodings` package). The codec uses mapping to encode and
790decode characters.
791
792Decoding mappings must map single string characters to single Unicode
793characters, integers (which are then interpreted as Unicode ordinals) or None
794(meaning "undefined mapping" and causing an error).
795
796Encoding mappings must map single Unicode characters to single string
797characters, integers (which are then interpreted as Latin-1 ordinals) or None
798(meaning "undefined mapping" and causing an error).
799
800The mapping objects provided must only support the __getitem__ mapping
801interface.
802
803If a character lookup fails with a LookupError, the character is copied as-is
804meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
805resp. Because of this, mappings only need to contain those mappings which map
806characters to different code points.
807
808
809.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, PyObject *mapping, const char *errors)
810
811 Create a Unicode object by decoding *size* bytes of the encoded string *s* using
812 the given *mapping* object. Return *NULL* if an exception was raised by the
813 codec. If *mapping* is *NULL* latin-1 decoding will be done. Else it can be a
814 dictionary mapping byte or a unicode string, which is treated as a lookup table.
815 Byte values greater that the length of the string and U+FFFE "characters" are
816 treated as "undefined mapping".
817
818
819.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors)
820
821 Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
822 *mapping* object and return a Python string object. Return *NULL* if an
823 exception was raised by the codec.
824
825
826.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping)
827
828 Encode a Unicode object using the given *mapping* object and return the result
829 as Python string object. Error handling is "strict". Return *NULL* if an
830 exception was raised by the codec.
831
832The following codec API is special in that maps Unicode to Unicode.
833
834
835.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *table, const char *errors)
836
837 Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
838 character mapping *table* to it and return the resulting Unicode object. Return
839 *NULL* when an exception was raised by the codec.
840
841 The *mapping* table must map Unicode ordinal integers to Unicode ordinal
842 integers or None (causing deletion of the character).
843
844 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
845 and sequences work well. Unmapped character ordinals (ones which cause a
846 :exc:`LookupError`) are left untouched and are copied as-is.
847
Jeroen Ruigrok van der Werven47a7d702009-04-27 05:43:17 +0000848
Georg Brandl54a3faa2008-01-20 09:30:57 +0000849These are the MBCS codec APIs. They are currently only available on Windows and
850use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
851DBCS) is a class of encodings, not just one. The target encoding is defined by
852the user settings on the machine running the codec.
853
Victor Stinner9076f9e2010-05-14 16:08:46 +0000854
855MBCS codecs for Windows
856"""""""""""""""""""""""
Georg Brandl54a3faa2008-01-20 09:30:57 +0000857
858
859.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors)
860
861 Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*.
862 Return *NULL* if an exception was raised by the codec.
863
864
865.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, const char *errors, int *consumed)
866
867 If *consumed* is *NULL*, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
868 *consumed* is not *NULL*, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
869 trailing lead byte and the number of bytes that have been decoded will be stored
870 in *consumed*.
871
872
873.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors)
874
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000875 Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return
876 a Python bytes object. Return *NULL* if an exception was raised by the
877 codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000878
879
880.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
881
Benjamin Petersonb6eba4f2009-01-13 23:14:04 +0000882 Encode a Unicode object using MBCS and return the result as Python bytes
883 object. Error handling is "strict". Return *NULL* if an exception was
884 raised by the codec.
Georg Brandl54a3faa2008-01-20 09:30:57 +0000885
Martin v. Löwisc15bdef2009-05-29 14:47:46 +0000886
Victor Stinner9076f9e2010-05-14 16:08:46 +0000887Methods & Slots
888"""""""""""""""
Georg Brandl54a3faa2008-01-20 09:30:57 +0000889
890
891.. _unicodemethodsandslots:
892
893Methods and Slot Functions
894^^^^^^^^^^^^^^^^^^^^^^^^^^
895
896The following APIs are capable of handling Unicode objects and strings on input
897(we refer to them as strings in the descriptions) and return Unicode objects or
898integers as appropriate.
899
900They all return *NULL* or ``-1`` if an exception occurs.
901
902
903.. cfunction:: PyObject* PyUnicode_Concat(PyObject *left, PyObject *right)
904
905 Concat two strings giving a new Unicode string.
906
907
908.. cfunction:: PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit)
909
910 Split a string giving a list of Unicode strings. If sep is *NULL*, splitting
911 will be done at all whitespace substrings. Otherwise, splits occur at the given
912 separator. At most *maxsplit* splits will be done. If negative, no limit is
913 set. Separators are not included in the resulting list.
914
915
916.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
917
918 Split a Unicode string at line breaks, returning a list of Unicode strings.
919 CRLF is considered to be one line break. If *keepend* is 0, the Line break
920 characters are not included in the resulting strings.
921
922
923.. cfunction:: PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors)
924
925 Translate a string by applying a character mapping table to it and return the
926 resulting Unicode object.
927
928 The mapping table must map Unicode ordinal integers to Unicode ordinal integers
929 or None (causing deletion of the character).
930
931 Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
932 and sequences work well. Unmapped character ordinals (ones which cause a
933 :exc:`LookupError`) are left untouched and are copied as-is.
934
935 *errors* has the usual meaning for codecs. It may be *NULL* which indicates to
936 use the default error handling.
937
938
939.. cfunction:: PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq)
940
941 Join a sequence of strings using the given separator and return the resulting
942 Unicode string.
943
944
945.. cfunction:: int PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
946
947 Return 1 if *substr* matches *str*[*start*:*end*] at the given tail end
948 (*direction* == -1 means to do a prefix match, *direction* == 1 a suffix match),
949 0 otherwise. Return ``-1`` if an error occurred.
950
951
952.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction)
953
954 Return the first position of *substr* in *str*[*start*:*end*] using the given
955 *direction* (*direction* == 1 means to do a forward search, *direction* == -1 a
956 backward search). The return value is the index of the first match; a value of
957 ``-1`` indicates that no match was found, and ``-2`` indicates that an error
958 occurred and an exception has been set.
959
960
961.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end)
962
963 Return the number of non-overlapping occurrences of *substr* in
964 ``str[start:end]``. Return ``-1`` if an error occurred.
965
966
967.. cfunction:: PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount)
968
969 Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and
970 return the resulting Unicode object. *maxcount* == -1 means replace all
971 occurrences.
972
973
974.. cfunction:: int PyUnicode_Compare(PyObject *left, PyObject *right)
975
976 Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
977 respectively.
978
979
Benjamin Petersonc22ed142008-07-01 19:12:34 +0000980.. cfunction:: int PyUnicode_CompareWithASCIIString(PyObject *uni, char *string)
981
982 Compare a unicode object, *uni*, with *string* and return -1, 0, 1 for less
983 than, equal, and greater than, respectively.
984
985
Georg Brandl54a3faa2008-01-20 09:30:57 +0000986.. cfunction:: int PyUnicode_RichCompare(PyObject *left, PyObject *right, int op)
987
988 Rich compare two unicode strings and return one of the following:
989
990 * ``NULL`` in case an exception was raised
991 * :const:`Py_True` or :const:`Py_False` for successful comparisons
992 * :const:`Py_NotImplemented` in case the type combination is unknown
993
994 Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
995 :exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
996 with a :exc:`UnicodeDecodeError`.
997
998 Possible values for *op* are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
999 :const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
1000
1001
1002.. cfunction:: PyObject* PyUnicode_Format(PyObject *format, PyObject *args)
1003
1004 Return a new string object from *format* and *args*; this is analogous to
1005 ``format % args``. The *args* argument must be a tuple.
1006
1007
1008.. cfunction:: int PyUnicode_Contains(PyObject *container, PyObject *element)
1009
1010 Check whether *element* is contained in *container* and return true or false
1011 accordingly.
1012
1013 *element* has to coerce to a one element Unicode string. ``-1`` is returned if
1014 there was an error.
1015
1016
1017.. cfunction:: void PyUnicode_InternInPlace(PyObject **string)
1018
1019 Intern the argument *\*string* in place. The argument must be the address of a
1020 pointer variable pointing to a Python unicode string object. If there is an
1021 existing interned string that is the same as *\*string*, it sets *\*string* to
1022 it (decrementing the reference count of the old string object and incrementing
1023 the reference count of the interned string object), otherwise it leaves
1024 *\*string* alone and interns it (incrementing its reference count).
1025 (Clarification: even though there is a lot of talk about reference counts, think
1026 of this function as reference-count-neutral; you own the object after the call
1027 if and only if you owned it before the call.)
1028
1029
1030.. cfunction:: PyObject* PyUnicode_InternFromString(const char *v)
1031
1032 A combination of :cfunc:`PyUnicode_FromString` and
1033 :cfunc:`PyUnicode_InternInPlace`, returning either a new unicode string object
1034 that has been interned, or a new ("owned") reference to an earlier interned
1035 string object with the same value.
1036