Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 17c25d577fd8e10a897c23fe58c362f94be539ea [file] [log] [blame]

Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13	These are the basic Unicode object types used for the Unicode implementation in
				14	Python:
				15
				16	.. % --- Unicode Type -------------------------------------------------------
				17
				18
				19	.. ctype:: Py_UNICODE
				20
				21	This type represents the storage type which is used by Python internally as
				22	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
				23	for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
				24	possible to build a UCS4 version of Python (most recent Linux distributions come
				25	with UCS4 builds of Python). These builds then use a 32-bit type for
				26	:ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				27	where :ctype:`wchar_t` is available and compatible with the chosen Python
				28	Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
				29	:ctype:`wchar_t` to enhance native platform compatibility. On all other
				30	platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
				31	short` (UCS2) or :ctype:`unsigned long` (UCS4).
				32
				33	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				34	this in mind when writing extensions or interfaces.
				35
				36
				37	.. ctype:: PyUnicodeObject
				38
				39	This subtype of :ctype:`PyObject` represents a Python Unicode object.
				40
				41
				42	.. cvar:: PyTypeObject PyUnicode_Type
				43
				44	This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
				45	is exposed to Python code as ``str``.
				46
				47	The following APIs are really C macros and can be used to do fast checks and to
				48	access internal read-only data of Unicode objects:
				49
				50
				51	.. cfunction:: int PyUnicode_Check(PyObject *o)
				52
				53	Return true if the object o is a Unicode object or an instance of a Unicode
				54	subtype.
				55
				56
				57	.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
				58
				59	Return true if the object o is a Unicode object, but not an instance of a
				60	subtype.
				61
				62
				63	.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				64
				65	Return the size of the object. o has to be a :ctype:`PyUnicodeObject` (not
				66	checked).
				67
				68
				69	.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				70
				71	Return the size of the object's internal buffer in bytes. o has to be a
				72	:ctype:`PyUnicodeObject` (not checked).
				73
				74
				75	.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				76
				77	Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. o
				78	has to be a :ctype:`PyUnicodeObject` (not checked).
				79
				80
				81	.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
				82
				83	Return a pointer to the internal buffer of the object. o has to be a
				84	:ctype:`PyUnicodeObject` (not checked).
				85
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	86
				87	.. cfunction:: int PyUnicode_ClearFreeList(void)
				88
				89	Clear the free list. Return the total number of freed items.
				90
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	91	Unicode provides many different character properties. The most often needed ones
				92	are available through these macros which are mapped to C functions depending on
				93	the Python configuration.
				94
				95	.. % --- Unicode character properties ---------------------------------------
				96
				97
				98	.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
				99
				100	Return 1 or 0 depending on whether ch is a whitespace character.
				101
				102
				103	.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
				104
				105	Return 1 or 0 depending on whether ch is a lowercase character.
				106
				107
				108	.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
				109
				110	Return 1 or 0 depending on whether ch is an uppercase character.
				111
				112
				113	.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
				114
				115	Return 1 or 0 depending on whether ch is a titlecase character.
				116
				117
				118	.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
				119
				120	Return 1 or 0 depending on whether ch is a linebreak character.
				121
				122
				123	.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
				124
				125	Return 1 or 0 depending on whether ch is a decimal character.
				126
				127
				128	.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
				129
				130	Return 1 or 0 depending on whether ch is a digit character.
				131
				132
				133	.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
				134
				135	Return 1 or 0 depending on whether ch is a numeric character.
				136
				137
				138	.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
				139
				140	Return 1 or 0 depending on whether ch is an alphabetic character.
				141
				142
				143	.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
				144
				145	Return 1 or 0 depending on whether ch is an alphanumeric character.
				146
				147	These APIs can be used for fast direct character conversions:
				148
				149
				150	.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
				151
				152	Return the character ch converted to lower case.
				153
				154
				155	.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
				156
				157	Return the character ch converted to upper case.
				158
				159
				160	.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
				161
				162	Return the character ch converted to title case.
				163
				164
				165	.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
				166
				167	Return the character ch converted to a decimal positive integer. Return
				168	``-1`` if this is not possible. This macro does not raise exceptions.
				169
				170
				171	.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
				172
				173	Return the character ch converted to a single digit integer. Return ``-1`` if
				174	this is not possible. This macro does not raise exceptions.
				175
				176
				177	.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
				178
				179	Return the character ch converted to a double. Return ``-1.0`` if this is not
				180	possible. This macro does not raise exceptions.
				181
				182	To create Unicode objects and access their basic sequence properties, use these
				183	APIs:
				184
				185	.. % --- Plain Py_UNICODE ---------------------------------------------------
				186
				187
				188	.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				189
				190	Create a Unicode Object from the Py_UNICODE buffer u of the given size. u
				191	may be NULL which causes the contents to be undefined. It is the user's
				192	responsibility to fill in the needed data. The buffer is copied into the new
				193	object. If the buffer is not NULL, the return value might be a shared object.
				194	Therefore, modification of the resulting Unicode object is only allowed when u
				195	is NULL.
				196
				197
				198	.. cfunction:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
				199
				200	Create a Unicode Object from the char buffer u. The bytes will be interpreted
				201	as being UTF-8 encoded. u may also be NULL which
				202	causes the contents to be undefined. It is the user's responsibility to fill in
				203	the needed data. The buffer is copied into the new object. If the buffer is not
				204	NULL, the return value might be a shared object. Therefore, modification of
				205	the resulting Unicode object is only allowed when u is NULL.
				206
				207
				208	.. cfunction:: PyObject PyUnicode_FromString(const char u)
				209
				210	Create a Unicode object from an UTF-8 encoded null-terminated char buffer
				211	u.
				212
				213
				214	.. cfunction:: PyObject* PyUnicode_FromFormat(const char *format, ...)
				215
				216	Take a C :cfunc:`printf`\ -style format string and a variable number of
				217	arguments, calculate the size of the resulting Python unicode string and return
				218	a string with the values formatted into it. The variable arguments must be C
				219	types and must correspond exactly to the format characters in the format
				220	string. The following format characters are allowed:
				221
				222	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				223	.. % because not all compilers support the %z width modifier -- we fake it
				224	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
				225
				226	+-------------------+---------------------+--------------------------------+
				227	\| Format Characters \| Type \| Comment \|
				228	+===================+=====================+================================+
				229	\| :attr:`%%` \| n/a \| The literal % character. \|
				230	+-------------------+---------------------+--------------------------------+
				231	\| :attr:`%c` \| int \| A single character, \|
				232	\| \| \| represented as an C int. \|
				233	+-------------------+---------------------+--------------------------------+
				234	\| :attr:`%d` \| int \| Exactly equivalent to \|
				235	\| \| \| ``printf("%d")``. \|
				236	+-------------------+---------------------+--------------------------------+
				237	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				238	\| \| \| ``printf("%u")``. \|
				239	+-------------------+---------------------+--------------------------------+
				240	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				241	\| \| \| ``printf("%ld")``. \|
				242	+-------------------+---------------------+--------------------------------+
				243	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				244	\| \| \| ``printf("%lu")``. \|
				245	+-------------------+---------------------+--------------------------------+
				246	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				247	\| \| \| ``printf("%zd")``. \|
				248	+-------------------+---------------------+--------------------------------+
				249	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				250	\| \| \| ``printf("%zu")``. \|
				251	+-------------------+---------------------+--------------------------------+
				252	\| :attr:`%i` \| int \| Exactly equivalent to \|
				253	\| \| \| ``printf("%i")``. \|
				254	+-------------------+---------------------+--------------------------------+
				255	\| :attr:`%x` \| int \| Exactly equivalent to \|
				256	\| \| \| ``printf("%x")``. \|
				257	+-------------------+---------------------+--------------------------------+
				258	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				259	\| \| \| array. \|
				260	+-------------------+---------------------+--------------------------------+
				261	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				262	\| \| \| pointer. Mostly equivalent to \|
				263	\| \| \| ``printf("%p")`` except that \|
				264	\| \| \| it is guaranteed to start with \|
				265	\| \| \| the literal ``0x`` regardless \|
				266	\| \| \| of what the platform's \|
				267	\| \| \| ``printf`` yields. \|
				268	+-------------------+---------------------+--------------------------------+
				269	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				270	+-------------------+---------------------+--------------------------------+
				271	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				272	\| \| \| NULL) and a null-terminated \|
				273	\| \| \| C character array as a second \|
				274	\| \| \| parameter (which will be used, \|
				275	\| \| \| if the first parameter is \|
				276	\| \| \| NULL). \|
				277	+-------------------+---------------------+--------------------------------+
				278	\| :attr:`%S` \| PyObject\* \| The result of calling \|
				279	\| \| \| :func:`PyObject_Unicode`. \|
				280	+-------------------+---------------------+--------------------------------+
				281	\| :attr:`%R` \| PyObject\* \| The result of calling \|
				282	\| \| \| :func:`PyObject_Repr`. \|
				283	+-------------------+---------------------+--------------------------------+
				284
				285	An unrecognized format character causes all the rest of the format string to be
				286	copied as-is to the result string, and any extra arguments discarded.
				287
				288
				289	.. cfunction:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
				290
				291	Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
				292	arguments.
				293
				294
				295	.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				296
				297	Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
				298	buffer, NULL if unicode is not a Unicode object.
				299
				300
				301	.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				302
				303	Return the length of the Unicode object.
				304
				305
				306	.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
				307
				308	Coerce an encoded object obj to an Unicode object and return a reference with
				309	incremented refcount.
				310
				311	String and other char buffer compatible objects are decoded according to the
				312	given encoding and using the error handling defined by errors. Both can be
				313	NULL to have the interface use the default values (see the next section for
				314	details).
				315
				316	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				317	set.
				318
				319	The API returns NULL if there was an error. The caller is responsible for
				320	decref'ing the returned objects.
				321
				322
				323	.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
				324
				325	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				326	throughout the interpreter whenever coercion to Unicode is needed.
				327
				328	If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
				329	Python can interface directly to this type using the following functions.
				330	Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
				331	the system's :ctype:`wchar_t`.
				332
				333	.. % --- wchar_t support for platforms which support it ---------------------
				334
				335
				336	.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
				337
				338	Create a Unicode object from the :ctype:`wchar_t` buffer w of the given size.
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	339	Passing -1 as the size indicates that the function must itself compute the length,
				340	using wcslen.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	341	Return NULL on failure.
				342
				343
				344	.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
				345
				346	Copy the Unicode object contents into the :ctype:`wchar_t` buffer w. At most
				347	size :ctype:`wchar_t` characters are copied (excluding a possibly trailing
				348	0-termination character). Return the number of :ctype:`wchar_t` characters
				349	copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
				350	string may or may not be 0-terminated. It is the responsibility of the caller
				351	to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
				352	required by the application.
				353
				354
				355	.. _builtincodecs:
				356
				357	Built-in Codecs
				358	^^^^^^^^^^^^^^^
				359
				360	Python provides a set of builtin codecs which are written in C for speed. All of
				361	these codecs are directly usable via the following functions.
				362
				363	Many of the following APIs take two arguments encoding and errors. These
				364	parameters encoding and errors have the same semantics as the ones of the
				365	builtin unicode() Unicode object constructor.
				366
				367	Setting encoding to NULL causes the default encoding to be used which is
				368	ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
				369	as the encoding for file names. This variable should be treated as read-only: On
				370	some systems, it will be a pointer to a static string, on others, it will change
				371	at run-time (such as when the application invokes setlocale).
				372
				373	Error handling is set by errors which may also be set to NULL meaning to use
				374	the default handling defined for the codec. Default error handling for all
				375	builtin codecs is "strict" (:exc:`ValueError` is raised).
				376
				377	The codecs all use a similar interface. Only deviation from the following
				378	generic ones are documented for simplicity.
				379
				380	These are the generic codec APIs:
				381
				382	.. % --- Generic Codecs -----------------------------------------------------
				383
				384
				385	.. cfunction:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
				386
				387	Create a Unicode object by decoding size bytes of the encoded string s.
				388	encoding and errors have the same meaning as the parameters of the same name
				389	in the :func:`unicode` builtin function. The codec to be used is looked up
				390	using the Python codec registry. Return NULL if an exception was raised by
				391	the codec.
				392
				393
				394	.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
				395
				396	Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
				397	string object. encoding and errors have the same meaning as the parameters
				398	of the same name in the Unicode :meth:`encode` method. The codec to be used is
				399	looked up using the Python codec registry. Return NULL if an exception was
				400	raised by the codec.
				401
				402
				403	.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
				404
				405	Encode a Unicode object and return the result as Python string object.
				406	encoding and errors have the same meaning as the parameters of the same name
				407	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				408	the Python codec registry. Return NULL if an exception was raised by the
				409	codec.
				410
				411	These are the UTF-8 codec APIs:
				412
				413	.. % --- UTF-8 Codecs -------------------------------------------------------
				414
				415
				416	.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
				417
				418	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				419	s. Return NULL if an exception was raised by the codec.
				420
				421
				422	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				423
				424	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
				425	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				426	treated as an error. Those bytes will not be decoded and the number of bytes
				427	that have been decoded will be stored in consumed.
				428
				429
				430	.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				431
				432	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and return a
				433	Python string object. Return NULL if an exception was raised by the codec.
				434
				435
				436	.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
				437
				438	Encode a Unicode object using UTF-8 and return the result as Python string
				439	object. Error handling is "strict". Return NULL if an exception was raised
				440	by the codec.
				441
				442	These are the UTF-32 codec APIs:
				443
				444	.. % --- UTF-32 Codecs ------------------------------------------------------ */
				445
				446
				447	.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
				448
				449	Decode length bytes from a UTF-32 encoded buffer string and return the
				450	corresponding Unicode object. errors (if non-NULL) defines the error
				451	handling. It defaults to "strict".
				452
				453	If byteorder is non-NULL, the decoder starts decoding using the given byte
				454	order::
				455
				456	*byteorder == -1: little endian
				457	*byteorder == 0: native order
				458	*byteorder == 1: big endian
				459
				460	and then switches if the first four bytes of the input data are a byte order mark
				461	(BOM) and the specified byte order is native order. This BOM is not copied into
				462	the resulting Unicode string. After completion, \byteorder* is set to the
				463	current byte order at the end of input data.
				464
				465	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				466
				467	If byteorder is NULL, the codec starts in native order mode.
				468
				469	Return NULL if an exception was raised by the codec.
				470
				471
				472	.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				473
				474	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
				475	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
				476	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				477	by four) as an error. Those bytes will not be decoded and the number of bytes
				478	that have been decoded will be stored in consumed.
				479
				480
				481	.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				482
				483	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
				484	data in s. If byteorder is not ``0``, output is written according to the
				485	following byte order::
				486
				487	byteorder == -1: little endian
				488	byteorder == 0: native byte order (writes a BOM mark)
				489	byteorder == 1: big endian
				490
				491	If byteorder is ``0``, the output string will always start with the Unicode BOM
				492	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				493
				494	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				495	as a single codepoint.
				496
				497	Return NULL if an exception was raised by the codec.
				498
				499
				500	.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				501
				502	Return a Python string using the UTF-32 encoding in native byte order. The
				503	string always starts with a BOM mark. Error handling is "strict". Return
				504	NULL if an exception was raised by the codec.
				505
				506
				507	These are the UTF-16 codec APIs:
				508
				509	.. % --- UTF-16 Codecs ------------------------------------------------------ */
				510
				511
				512	.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
				513
				514	Decode length bytes from a UTF-16 encoded buffer string and return the
				515	corresponding Unicode object. errors (if non-NULL) defines the error
				516	handling. It defaults to "strict".
				517
				518	If byteorder is non-NULL, the decoder starts decoding using the given byte
				519	order::
				520
				521	*byteorder == -1: little endian
				522	*byteorder == 0: native order
				523	*byteorder == 1: big endian
				524
				525	and then switches if the first two bytes of the input data are a byte order mark
				526	(BOM) and the specified byte order is native order. This BOM is not copied into
				527	the resulting Unicode string. After completion, \byteorder* is set to the
				528	current byte order at the end of input data.
				529
				530	If byteorder is NULL, the codec starts in native order mode.
				531
				532	Return NULL if an exception was raised by the codec.
				533
				534
				535	.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				536
				537	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
				538	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
				539	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				540	split surrogate pair) as an error. Those bytes will not be decoded and the
				541	number of bytes that have been decoded will be stored in consumed.
				542
				543
				544	.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				545
				546	Return a Python string object holding the UTF-16 encoded value of the Unicode
				547	data in s. If byteorder is not ``0``, output is written according to the
				548	following byte order::
				549
				550	byteorder == -1: little endian
				551	byteorder == 0: native byte order (writes a BOM mark)
				552	byteorder == 1: big endian
				553
				554	If byteorder is ``0``, the output string will always start with the Unicode BOM
				555	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				556
				557	If Py_UNICODE_WIDE is defined, a single :ctype:`Py_UNICODE` value may get
				558	represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
				559	values is interpreted as an UCS-2 character.
				560
				561	Return NULL if an exception was raised by the codec.
				562
				563
				564	.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				565
				566	Return a Python string using the UTF-16 encoding in native byte order. The
				567	string always starts with a BOM mark. Error handling is "strict". Return
				568	NULL if an exception was raised by the codec.
				569
				570	These are the "Unicode Escape" codec APIs:
				571
				572	.. % --- Unicode-Escape Codecs ----------------------------------------------
				573
				574
				575	.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				576
				577	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				578	string s. Return NULL if an exception was raised by the codec.
				579
				580
				581	.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
				582
				583	Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
				584	return a Python string object. Return NULL if an exception was raised by the
				585	codec.
				586
				587
				588	.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				589
				590	Encode a Unicode object using Unicode-Escape and return the result as Python
				591	string object. Error handling is "strict". Return NULL if an exception was
				592	raised by the codec.
				593
				594	These are the "Raw Unicode Escape" codec APIs:
				595
				596	.. % --- Raw-Unicode-Escape Codecs ------------------------------------------
				597
				598
				599	.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				600
				601	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				602	encoded string s. Return NULL if an exception was raised by the codec.
				603
				604
				605	.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
				606
				607	Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
				608	and return a Python string object. Return NULL if an exception was raised by
				609	the codec.
				610
				611
				612	.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				613
				614	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				615	Python string object. Error handling is "strict". Return NULL if an exception
				616	was raised by the codec.
				617
				618	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				619	ordinals and only these are accepted by the codecs during encoding.
				620
				621	.. % --- Latin-1 Codecs -----------------------------------------------------
				622
				623
				624	.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
				625
				626	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				627	s. Return NULL if an exception was raised by the codec.
				628
				629
				630	.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
				631
				632	Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
				633	a Python string object. Return NULL if an exception was raised by the codec.
				634
				635
				636	.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				637
				638	Encode a Unicode object using Latin-1 and return the result as Python string
				639	object. Error handling is "strict". Return NULL if an exception was raised
				640	by the codec.
				641
				642	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				643	codes generate errors.
				644
				645	.. % --- ASCII Codecs -------------------------------------------------------
				646
				647
				648	.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
				649
				650	Create a Unicode object by decoding size bytes of the ASCII encoded string
				651	s. Return NULL if an exception was raised by the codec.
				652
				653
				654	.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
				655
				656	Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
				657	Python string object. Return NULL if an exception was raised by the codec.
				658
				659
				660	.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				661
				662	Encode a Unicode object using ASCII and return the result as Python string
				663	object. Error handling is "strict". Return NULL if an exception was raised
				664	by the codec.
				665
				666	These are the mapping codec APIs:
				667
				668	.. % --- Character Map Codecs -----------------------------------------------
				669
				670	This codec is special in that it can be used to implement many different codecs
				671	(and this is in fact what was done to obtain most of the standard codecs
				672	included in the :mod:`encodings` package). The codec uses mapping to encode and
				673	decode characters.
				674
				675	Decoding mappings must map single string characters to single Unicode
				676	characters, integers (which are then interpreted as Unicode ordinals) or None
				677	(meaning "undefined mapping" and causing an error).
				678
				679	Encoding mappings must map single Unicode characters to single string
				680	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				681	(meaning "undefined mapping" and causing an error).
				682
				683	The mapping objects provided must only support the __getitem__ mapping
				684	interface.
				685
				686	If a character lookup fails with a LookupError, the character is copied as-is
				687	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				688	resp. Because of this, mappings only need to contain those mappings which map
				689	characters to different code points.
				690
				691
				692	.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
				693
				694	Create a Unicode object by decoding size bytes of the encoded string s using
				695	the given mapping object. Return NULL if an exception was raised by the
				696	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				697	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				698	Byte values greater that the length of the string and U+FFFE "characters" are
				699	treated as "undefined mapping".
				700
				701
				702	.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
				703
				704	Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
				705	mapping object and return a Python string object. Return NULL if an
				706	exception was raised by the codec.
				707
				708
				709	.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
				710
				711	Encode a Unicode object using the given mapping object and return the result
				712	as Python string object. Error handling is "strict". Return NULL if an
				713	exception was raised by the codec.
				714
				715	The following codec API is special in that maps Unicode to Unicode.
				716
				717
				718	.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
				719
				720	Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
				721	character mapping table to it and return the resulting Unicode object. Return
				722	NULL when an exception was raised by the codec.
				723
				724	The mapping table must map Unicode ordinal integers to Unicode ordinal
				725	integers or None (causing deletion of the character).
				726
				727	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				728	and sequences work well. Unmapped character ordinals (ones which cause a
				729	:exc:`LookupError`) are left untouched and are copied as-is.
				730
				731	These are the MBCS codec APIs. They are currently only available on Windows and
				732	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				733	DBCS) is a class of encodings, not just one. The target encoding is defined by
				734	the user settings on the machine running the codec.
				735
				736	.. % --- MBCS codecs for Windows --------------------------------------------
				737
				738
				739	.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
				740
				741	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				742	Return NULL if an exception was raised by the codec.
				743
				744
				745	.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
				746
				747	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
				748	consumed is not NULL, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
				749	trailing lead byte and the number of bytes that have been decoded will be stored
				750	in consumed.
				751
				752
				753	.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
				754
				755	Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
				756	Python string object. Return NULL if an exception was raised by the codec.
				757
				758
				759	.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				760
				761	Encode a Unicode object using MBCS and return the result as Python string
				762	object. Error handling is "strict". Return NULL if an exception was raised
				763	by the codec.
				764
				765	.. % --- Methods & Slots ----------------------------------------------------
				766
				767
				768	.. _unicodemethodsandslots:
				769
				770	Methods and Slot Functions
				771	^^^^^^^^^^^^^^^^^^^^^^^^^^
				772
				773	The following APIs are capable of handling Unicode objects and strings on input
				774	(we refer to them as strings in the descriptions) and return Unicode objects or
				775	integers as appropriate.
				776
				777	They all return NULL or ``-1`` if an exception occurs.
				778
				779
				780	.. cfunction:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
				781
				782	Concat two strings giving a new Unicode string.
				783
				784
				785	.. cfunction:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
				786
				787	Split a string giving a list of Unicode strings. If sep is NULL, splitting
				788	will be done at all whitespace substrings. Otherwise, splits occur at the given
				789	separator. At most maxsplit splits will be done. If negative, no limit is
				790	set. Separators are not included in the resulting list.
				791
				792
				793	.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
				794
				795	Split a Unicode string at line breaks, returning a list of Unicode strings.
				796	CRLF is considered to be one line break. If keepend is 0, the Line break
				797	characters are not included in the resulting strings.
				798
				799
				800	.. cfunction:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
				801
				802	Translate a string by applying a character mapping table to it and return the
				803	resulting Unicode object.
				804
				805	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				806	or None (causing deletion of the character).
				807
				808	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				809	and sequences work well. Unmapped character ordinals (ones which cause a
				810	:exc:`LookupError`) are left untouched and are copied as-is.
				811
				812	errors has the usual meaning for codecs. It may be NULL which indicates to
				813	use the default error handling.
				814
				815
				816	.. cfunction:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
				817
				818	Join a sequence of strings using the given separator and return the resulting
				819	Unicode string.
				820
				821
				822	.. cfunction:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				823
				824	Return 1 if substr matches str[start:end] at the given tail end
				825	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				826	0 otherwise. Return ``-1`` if an error occurred.
				827
				828
				829	.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				830
				831	Return the first position of substr in str[start:end] using the given
				832	direction (direction == 1 means to do a forward search, direction == -1 a
				833	backward search). The return value is the index of the first match; a value of
				834	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				835	occurred and an exception has been set.
				836
				837
				838	.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
				839
				840	Return the number of non-overlapping occurrences of substr in
				841	``str[start:end]``. Return ``-1`` if an error occurred.
				842
				843
				844	.. cfunction:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
				845
				846	Replace at most maxcount occurrences of substr in str with replstr and
				847	return the resulting Unicode object. maxcount == -1 means replace all
				848	occurrences.
				849
				850
				851	.. cfunction:: int PyUnicode_Compare(PyObject left, PyObject right)
				852
				853	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				854	respectively.
				855
				856
				857	.. cfunction:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
				858
				859	Rich compare two unicode strings and return one of the following:
				860
				861	* ``NULL`` in case an exception was raised
				862	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				863	* :const:`Py_NotImplemented` in case the type combination is unknown
				864
				865	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				866	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				867	with a :exc:`UnicodeDecodeError`.
				868
				869	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				870	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				871
				872
				873	.. cfunction:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
				874
				875	Return a new string object from format and args; this is analogous to
				876	``format % args``. The args argument must be a tuple.
				877
				878
				879	.. cfunction:: int PyUnicode_Contains(PyObject container, PyObject element)
				880
				881	Check whether element is contained in container and return true or false
				882	accordingly.
				883
				884	element has to coerce to a one element Unicode string. ``-1`` is returned if
				885	there was an error.
				886
				887
				888	.. cfunction:: void PyUnicode_InternInPlace(PyObject **string)
				889
				890	Intern the argument \string* in place. The argument must be the address of a
				891	pointer variable pointing to a Python unicode string object. If there is an
				892	existing interned string that is the same as \string, it sets \string to
				893	it (decrementing the reference count of the old string object and incrementing
				894	the reference count of the interned string object), otherwise it leaves
				895	\string* alone and interns it (incrementing its reference count).
				896	(Clarification: even though there is a lot of talk about reference counts, think
				897	of this function as reference-count-neutral; you own the object after the call
				898	if and only if you owned it before the call.)
				899
				900
				901	.. cfunction:: PyObject* PyUnicode_InternFromString(const char *v)
				902
				903	A combination of :cfunc:`PyUnicode_FromString` and
				904	:cfunc:`PyUnicode_InternInPlace`, returning either a new unicode string object
				905	that has been interned, or a new ("owned") reference to an earlier interned
				906	string object with the same value.
				907