Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 8469ff97fd307da3b02aec08a20775f1dd049209 [file] [log] [blame]

Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13
				14	These are the basic Unicode object types used for the Unicode implementation in
				15	Python:
				16
				17	.. % --- Unicode Type -------------------------------------------------------
				18
				19
				20	.. ctype:: Py_UNICODE
				21
				22	This type represents the storage type which is used by Python internally as
				23	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
				24	for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
				25	possible to build a UCS4 version of Python (most recent Linux distributions come
				26	with UCS4 builds of Python). These builds then use a 32-bit type for
				27	:ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				28	where :ctype:`wchar_t` is available and compatible with the chosen Python
				29	Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
				30	:ctype:`wchar_t` to enhance native platform compatibility. On all other
				31	platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
				32	short` (UCS2) or :ctype:`unsigned long` (UCS4).
				33
				34	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				35	this in mind when writing extensions or interfaces.
				36
				37
				38	.. ctype:: PyUnicodeObject
				39
				40	This subtype of :ctype:`PyObject` represents a Python Unicode object.
				41
				42
				43	.. cvar:: PyTypeObject PyUnicode_Type
				44
				45	This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
				46	is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
				47
				48	The following APIs are really C macros and can be used to do fast checks and to
				49	access internal read-only data of Unicode objects:
				50
				51
				52	.. cfunction:: int PyUnicode_Check(PyObject *o)
				53
				54	Return true if the object o is a Unicode object or an instance of a Unicode
				55	subtype.
				56
				57	.. versionchanged:: 2.2
				58	Allowed subtypes to be accepted.
				59
				60
				61	.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
				62
				63	Return true if the object o is a Unicode object, but not an instance of a
				64	subtype.
				65
				66	.. versionadded:: 2.2
				67
				68
				69	.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				70
				71	Return the size of the object. o has to be a :ctype:`PyUnicodeObject` (not
				72	checked).
				73
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	74	.. versionchanged:: 2.5
				75	This function returned an :ctype:`int` type. This might require changes
				76	in your code for properly supporting 64-bit systems.
				77
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	78
				79	.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				80
				81	Return the size of the object's internal buffer in bytes. o has to be a
				82	:ctype:`PyUnicodeObject` (not checked).
				83
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	84	.. versionchanged:: 2.5
				85	This function returned an :ctype:`int` type. This might require changes
				86	in your code for properly supporting 64-bit systems.
				87
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	88
				89	.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				90
				91	Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. o
				92	has to be a :ctype:`PyUnicodeObject` (not checked).
				93
				94
				95	.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
				96
				97	Return a pointer to the internal buffer of the object. o has to be a
				98	:ctype:`PyUnicodeObject` (not checked).
				99
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	100
				101	.. cfunction:: int PyUnicode_ClearFreeList(void)
				102
				103	Clear the free list. Return the total number of freed items.
				104
				105	.. versionadded:: 2.6
				106
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	107	Unicode provides many different character properties. The most often needed ones
				108	are available through these macros which are mapped to C functions depending on
				109	the Python configuration.
				110
				111	.. % --- Unicode character properties ---------------------------------------
				112
				113
				114	.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
				115
				116	Return 1 or 0 depending on whether ch is a whitespace character.
				117
				118
				119	.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
				120
				121	Return 1 or 0 depending on whether ch is a lowercase character.
				122
				123
				124	.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
				125
				126	Return 1 or 0 depending on whether ch is an uppercase character.
				127
				128
				129	.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
				130
				131	Return 1 or 0 depending on whether ch is a titlecase character.
				132
				133
				134	.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
				135
				136	Return 1 or 0 depending on whether ch is a linebreak character.
				137
				138
				139	.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
				140
				141	Return 1 or 0 depending on whether ch is a decimal character.
				142
				143
				144	.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
				145
				146	Return 1 or 0 depending on whether ch is a digit character.
				147
				148
				149	.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
				150
				151	Return 1 or 0 depending on whether ch is a numeric character.
				152
				153
				154	.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
				155
				156	Return 1 or 0 depending on whether ch is an alphabetic character.
				157
				158
				159	.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
				160
				161	Return 1 or 0 depending on whether ch is an alphanumeric character.
				162
				163	These APIs can be used for fast direct character conversions:
				164
				165
				166	.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
				167
				168	Return the character ch converted to lower case.
				169
				170
				171	.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
				172
				173	Return the character ch converted to upper case.
				174
				175
				176	.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
				177
				178	Return the character ch converted to title case.
				179
				180
				181	.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
				182
				183	Return the character ch converted to a decimal positive integer. Return
				184	``-1`` if this is not possible. This macro does not raise exceptions.
				185
				186
				187	.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
				188
				189	Return the character ch converted to a single digit integer. Return ``-1`` if
				190	this is not possible. This macro does not raise exceptions.
				191
				192
				193	.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
				194
				195	Return the character ch converted to a double. Return ``-1.0`` if this is not
				196	possible. This macro does not raise exceptions.
				197
				198	To create Unicode objects and access their basic sequence properties, use these
				199	APIs:
				200
				201	.. % --- Plain Py_UNICODE ---------------------------------------------------
				202
				203
				204	.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				205
				206	Create a Unicode Object from the Py_UNICODE buffer u of the given size. u
				207	may be NULL which causes the contents to be undefined. It is the user's
				208	responsibility to fill in the needed data. The buffer is copied into the new
				209	object. If the buffer is not NULL, the return value might be a shared object.
				210	Therefore, modification of the resulting Unicode object is only allowed when u
				211	is NULL.
				212
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	213	.. versionchanged:: 2.5
				214	This function used an :ctype:`int` type for size. This might require
				215	changes in your code for properly supporting 64-bit systems.
				216
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	217
				218	.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				219
				220	Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
				221	buffer, NULL if unicode is not a Unicode object.
				222
				223
				224	.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				225
				226	Return the length of the Unicode object.
				227
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	228	.. versionchanged:: 2.5
				229	This function returned an :ctype:`int` type. This might require changes
				230	in your code for properly supporting 64-bit systems.
				231
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	232
				233	.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
				234
				235	Coerce an encoded object obj to an Unicode object and return a reference with
				236	incremented refcount.
				237
				238	String and other char buffer compatible objects are decoded according to the
				239	given encoding and using the error handling defined by errors. Both can be
				240	NULL to have the interface use the default values (see the next section for
				241	details).
				242
				243	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				244	set.
				245
				246	The API returns NULL if there was an error. The caller is responsible for
				247	decref'ing the returned objects.
				248
				249
				250	.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
				251
				252	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				253	throughout the interpreter whenever coercion to Unicode is needed.
				254
				255	If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
				256	Python can interface directly to this type using the following functions.
				257	Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
				258	the system's :ctype:`wchar_t`.
				259
				260	.. % --- wchar_t support for platforms which support it ---------------------
				261
				262
				263	.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
				264
				265	Create a Unicode object from the :ctype:`wchar_t` buffer w of the given size.
				266	Return NULL on failure.
				267
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	268	.. versionchanged:: 2.5
				269	This function used an :ctype:`int` type for size. This might require
				270	changes in your code for properly supporting 64-bit systems.
				271
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	272
				273	.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
				274
				275	Copy the Unicode object contents into the :ctype:`wchar_t` buffer w. At most
				276	size :ctype:`wchar_t` characters are copied (excluding a possibly trailing
				277	0-termination character). Return the number of :ctype:`wchar_t` characters
				278	copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
				279	string may or may not be 0-terminated. It is the responsibility of the caller
				280	to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
				281	required by the application.
				282
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	283	.. versionchanged:: 2.5
				284	This function returned an :ctype:`int` type and used an :ctype:`int`
				285	type for size. This might require changes in your code for properly
				286	supporting 64-bit systems.
				287
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	288
				289	.. _builtincodecs:
				290
				291	Built-in Codecs
				292	^^^^^^^^^^^^^^^
				293
				294	Python provides a set of builtin codecs which are written in C for speed. All of
				295	these codecs are directly usable via the following functions.
				296
				297	Many of the following APIs take two arguments encoding and errors. These
				298	parameters encoding and errors have the same semantics as the ones of the
				299	builtin unicode() Unicode object constructor.
				300
				301	Setting encoding to NULL causes the default encoding to be used which is
				302	ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
				303	as the encoding for file names. This variable should be treated as read-only: On
				304	some systems, it will be a pointer to a static string, on others, it will change
				305	at run-time (such as when the application invokes setlocale).
				306
				307	Error handling is set by errors which may also be set to NULL meaning to use
				308	the default handling defined for the codec. Default error handling for all
				309	builtin codecs is "strict" (:exc:`ValueError` is raised).
				310
				311	The codecs all use a similar interface. Only deviation from the following
				312	generic ones are documented for simplicity.
				313
				314	These are the generic codec APIs:
				315
				316	.. % --- Generic Codecs -----------------------------------------------------
				317
				318
				319	.. cfunction:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
				320
				321	Create a Unicode object by decoding size bytes of the encoded string s.
				322	encoding and errors have the same meaning as the parameters of the same name
				323	in the :func:`unicode` builtin function. The codec to be used is looked up
				324	using the Python codec registry. Return NULL if an exception was raised by
				325	the codec.
				326
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	327	.. versionchanged:: 2.5
				328	This function used an :ctype:`int` type for size. This might require
				329	changes in your code for properly supporting 64-bit systems.
				330
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	331
				332	.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
				333
				334	Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
				335	string object. encoding and errors have the same meaning as the parameters
				336	of the same name in the Unicode :meth:`encode` method. The codec to be used is
				337	looked up using the Python codec registry. Return NULL if an exception was
				338	raised by the codec.
				339
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	340	.. versionchanged:: 2.5
				341	This function used an :ctype:`int` type for size. This might require
				342	changes in your code for properly supporting 64-bit systems.
				343
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	344
				345	.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
				346
				347	Encode a Unicode object and return the result as Python string object.
				348	encoding and errors have the same meaning as the parameters of the same name
				349	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				350	the Python codec registry. Return NULL if an exception was raised by the
				351	codec.
				352
				353	These are the UTF-8 codec APIs:
				354
				355	.. % --- UTF-8 Codecs -------------------------------------------------------
				356
				357
				358	.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
				359
				360	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				361	s. Return NULL if an exception was raised by the codec.
				362
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	363	.. versionchanged:: 2.5
				364	This function used an :ctype:`int` type for size. This might require
				365	changes in your code for properly supporting 64-bit systems.
				366
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	367
				368	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				369
				370	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
				371	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				372	treated as an error. Those bytes will not be decoded and the number of bytes
				373	that have been decoded will be stored in consumed.
				374
				375	.. versionadded:: 2.4
				376
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	377	.. versionchanged:: 2.5
				378	This function used an :ctype:`int` type for size. This might require
				379	changes in your code for properly supporting 64-bit systems.
				380
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	381
				382	.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				383
				384	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and return a
				385	Python string object. Return NULL if an exception was raised by the codec.
				386
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	387	.. versionchanged:: 2.5
				388	This function used an :ctype:`int` type for size. This might require
				389	changes in your code for properly supporting 64-bit systems.
				390
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	391
				392	.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
				393
				394	Encode a Unicode object using UTF-8 and return the result as Python string
				395	object. Error handling is "strict". Return NULL if an exception was raised
				396	by the codec.
				397
				398	These are the UTF-32 codec APIs:
				399
				400	.. % --- UTF-32 Codecs ------------------------------------------------------ */
				401
				402
				403	.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
				404
				405	Decode length bytes from a UTF-32 encoded buffer string and return the
				406	corresponding Unicode object. errors (if non-NULL) defines the error
				407	handling. It defaults to "strict".
				408
				409	If byteorder is non-NULL, the decoder starts decoding using the given byte
				410	order::
				411
				412	*byteorder == -1: little endian
				413	*byteorder == 0: native order
				414	*byteorder == 1: big endian
				415
				416	and then switches if the first four bytes of the input data are a byte order mark
				417	(BOM) and the specified byte order is native order. This BOM is not copied into
				418	the resulting Unicode string. After completion, \byteorder* is set to the
				419	current byte order at the end of input data.
				420
				421	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				422
				423	If byteorder is NULL, the codec starts in native order mode.
				424
				425	Return NULL if an exception was raised by the codec.
				426
				427	.. versionadded:: 2.6
				428
				429
				430	.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				431
				432	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
				433	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
				434	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				435	by four) as an error. Those bytes will not be decoded and the number of bytes
				436	that have been decoded will be stored in consumed.
				437
				438	.. versionadded:: 2.6
				439
				440
				441	.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				442
				443	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
				444	data in s. If byteorder is not ``0``, output is written according to the
				445	following byte order::
				446
				447	byteorder == -1: little endian
				448	byteorder == 0: native byte order (writes a BOM mark)
				449	byteorder == 1: big endian
				450
				451	If byteorder is ``0``, the output string will always start with the Unicode BOM
				452	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				453
				454	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				455	as a single codepoint.
				456
				457	Return NULL if an exception was raised by the codec.
				458
				459	.. versionadded:: 2.6
				460
				461
				462	.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				463
				464	Return a Python string using the UTF-32 encoding in native byte order. The
				465	string always starts with a BOM mark. Error handling is "strict". Return
				466	NULL if an exception was raised by the codec.
				467
				468	.. versionadded:: 2.6
				469
				470
				471	These are the UTF-16 codec APIs:
				472
				473	.. % --- UTF-16 Codecs ------------------------------------------------------ */
				474
				475
				476	.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
				477
				478	Decode length bytes from a UTF-16 encoded buffer string and return the
				479	corresponding Unicode object. errors (if non-NULL) defines the error
				480	handling. It defaults to "strict".
				481
				482	If byteorder is non-NULL, the decoder starts decoding using the given byte
				483	order::
				484
				485	*byteorder == -1: little endian
				486	*byteorder == 0: native order
				487	*byteorder == 1: big endian
				488
				489	and then switches if the first two bytes of the input data are a byte order mark
				490	(BOM) and the specified byte order is native order. This BOM is not copied into
				491	the resulting Unicode string. After completion, \byteorder* is set to the
				492	current byte order at the.
				493
				494	If byteorder is NULL, the codec starts in native order mode.
				495
				496	Return NULL if an exception was raised by the codec.
				497
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	498	.. versionchanged:: 2.5
				499	This function used an :ctype:`int` type for size. This might require
				500	changes in your code for properly supporting 64-bit systems.
				501
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	502
				503	.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				504
				505	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
				506	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
				507	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				508	split surrogate pair) as an error. Those bytes will not be decoded and the
				509	number of bytes that have been decoded will be stored in consumed.
				510
				511	.. versionadded:: 2.4
				512
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	513	.. versionchanged:: 2.5
				514	This function used an :ctype:`int` type for size and an :ctype:`int *`
				515	type for consumed. This might require changes in your code for
				516	properly supporting 64-bit systems.
				517
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	518
				519	.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				520
				521	Return a Python string object holding the UTF-16 encoded value of the Unicode
				522	data in s. If byteorder is not ``0``, output is written according to the
				523	following byte order::
				524
				525	byteorder == -1: little endian
				526	byteorder == 0: native byte order (writes a BOM mark)
				527	byteorder == 1: big endian
				528
				529	If byteorder is ``0``, the output string will always start with the Unicode BOM
				530	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				531
				532	If Py_UNICODE_WIDE is defined, a single :ctype:`Py_UNICODE` value may get
				533	represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
				534	values is interpreted as an UCS-2 character.
				535
				536	Return NULL if an exception was raised by the codec.
				537
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	538	.. versionchanged:: 2.5
				539	This function used an :ctype:`int` type for size. This might require
				540	changes in your code for properly supporting 64-bit systems.
				541
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	542
				543	.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				544
				545	Return a Python string using the UTF-16 encoding in native byte order. The
				546	string always starts with a BOM mark. Error handling is "strict". Return
				547	NULL if an exception was raised by the codec.
				548
				549	These are the "Unicode Escape" codec APIs:
				550
				551	.. % --- Unicode-Escape Codecs ----------------------------------------------
				552
				553
				554	.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				555
				556	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				557	string s. Return NULL if an exception was raised by the codec.
				558
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	559	.. versionchanged:: 2.5
				560	This function used an :ctype:`int` type for size. This might require
				561	changes in your code for properly supporting 64-bit systems.
				562
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	563
				564	.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
				565
				566	Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
				567	return a Python string object. Return NULL if an exception was raised by the
				568	codec.
				569
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	570	.. versionchanged:: 2.5
				571	This function used an :ctype:`int` type for size. This might require
				572	changes in your code for properly supporting 64-bit systems.
				573
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	574
				575	.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				576
				577	Encode a Unicode object using Unicode-Escape and return the result as Python
				578	string object. Error handling is "strict". Return NULL if an exception was
				579	raised by the codec.
				580
				581	These are the "Raw Unicode Escape" codec APIs:
				582
				583	.. % --- Raw-Unicode-Escape Codecs ------------------------------------------
				584
				585
				586	.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				587
				588	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				589	encoded string s. Return NULL if an exception was raised by the codec.
				590
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	591	.. versionchanged:: 2.5
				592	This function used an :ctype:`int` type for size. This might require
				593	changes in your code for properly supporting 64-bit systems.
				594
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	595
				596	.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
				597
				598	Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
				599	and return a Python string object. Return NULL if an exception was raised by
				600	the codec.
				601
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	602	.. versionchanged:: 2.5
				603	This function used an :ctype:`int` type for size. This might require
				604	changes in your code for properly supporting 64-bit systems.
				605
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	606
				607	.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				608
				609	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				610	Python string object. Error handling is "strict". Return NULL if an exception
				611	was raised by the codec.
				612
				613	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				614	ordinals and only these are accepted by the codecs during encoding.
				615
				616	.. % --- Latin-1 Codecs -----------------------------------------------------
				617
				618
				619	.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
				620
				621	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				622	s. Return NULL if an exception was raised by the codec.
				623
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	624	.. versionchanged:: 2.5
				625	This function used an :ctype:`int` type for size. This might require
				626	changes in your code for properly supporting 64-bit systems.
				627
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	628
				629	.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
				630
				631	Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
				632	a Python string object. Return NULL if an exception was raised by the codec.
				633
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	634	.. versionchanged:: 2.5
				635	This function used an :ctype:`int` type for size. This might require
				636	changes in your code for properly supporting 64-bit systems.
				637
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	638
				639	.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				640
				641	Encode a Unicode object using Latin-1 and return the result as Python string
				642	object. Error handling is "strict". Return NULL if an exception was raised
				643	by the codec.
				644
				645	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				646	codes generate errors.
				647
				648	.. % --- ASCII Codecs -------------------------------------------------------
				649
				650
				651	.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
				652
				653	Create a Unicode object by decoding size bytes of the ASCII encoded string
				654	s. Return NULL if an exception was raised by the codec.
				655
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	656	.. versionchanged:: 2.5
				657	This function used an :ctype:`int` type for size. This might require
				658	changes in your code for properly supporting 64-bit systems.
				659
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	660
				661	.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
				662
				663	Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
				664	Python string object. Return NULL if an exception was raised by the codec.
				665
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	666	.. versionchanged:: 2.5
				667	This function used an :ctype:`int` type for size. This might require
				668	changes in your code for properly supporting 64-bit systems.
				669
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	670
				671	.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				672
				673	Encode a Unicode object using ASCII and return the result as Python string
				674	object. Error handling is "strict". Return NULL if an exception was raised
				675	by the codec.
				676
				677	These are the mapping codec APIs:
				678
				679	.. % --- Character Map Codecs -----------------------------------------------
				680
				681	This codec is special in that it can be used to implement many different codecs
				682	(and this is in fact what was done to obtain most of the standard codecs
				683	included in the :mod:`encodings` package). The codec uses mapping to encode and
				684	decode characters.
				685
				686	Decoding mappings must map single string characters to single Unicode
				687	characters, integers (which are then interpreted as Unicode ordinals) or None
				688	(meaning "undefined mapping" and causing an error).
				689
				690	Encoding mappings must map single Unicode characters to single string
				691	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				692	(meaning "undefined mapping" and causing an error).
				693
				694	The mapping objects provided must only support the __getitem__ mapping
				695	interface.
				696
				697	If a character lookup fails with a LookupError, the character is copied as-is
				698	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				699	resp. Because of this, mappings only need to contain those mappings which map
				700	characters to different code points.
				701
				702
				703	.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
				704
				705	Create a Unicode object by decoding size bytes of the encoded string s using
				706	the given mapping object. Return NULL if an exception was raised by the
				707	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				708	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				709	Byte values greater that the length of the string and U+FFFE "characters" are
				710	treated as "undefined mapping".
				711
				712	.. versionchanged:: 2.4
				713	Allowed unicode string as mapping argument.
				714
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	715	.. versionchanged:: 2.5
				716	This function used an :ctype:`int` type for size. This might require
				717	changes in your code for properly supporting 64-bit systems.
				718
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	719
				720	.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
				721
				722	Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
				723	mapping object and return a Python string object. Return NULL if an
				724	exception was raised by the codec.
				725
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	726	.. versionchanged:: 2.5
				727	This function used an :ctype:`int` type for size. This might require
				728	changes in your code for properly supporting 64-bit systems.
				729
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	730
				731	.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
				732
				733	Encode a Unicode object using the given mapping object and return the result
				734	as Python string object. Error handling is "strict". Return NULL if an
				735	exception was raised by the codec.
				736
				737	The following codec API is special in that maps Unicode to Unicode.
				738
				739
				740	.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
				741
				742	Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
				743	character mapping table to it and return the resulting Unicode object. Return
				744	NULL when an exception was raised by the codec.
				745
				746	The mapping table must map Unicode ordinal integers to Unicode ordinal
				747	integers or None (causing deletion of the character).
				748
				749	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				750	and sequences work well. Unmapped character ordinals (ones which cause a
				751	:exc:`LookupError`) are left untouched and are copied as-is.
				752
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	753	.. versionchanged:: 2.5
				754	This function used an :ctype:`int` type for size. This might require
				755	changes in your code for properly supporting 64-bit systems.
				756
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	757	These are the MBCS codec APIs. They are currently only available on Windows and
				758	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				759	DBCS) is a class of encodings, not just one. The target encoding is defined by
				760	the user settings on the machine running the codec.
				761
				762	.. % --- MBCS codecs for Windows --------------------------------------------
				763
				764
				765	.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
				766
				767	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				768	Return NULL if an exception was raised by the codec.
				769
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	770	.. versionchanged:: 2.5
				771	This function used an :ctype:`int` type for size. This might require
				772	changes in your code for properly supporting 64-bit systems.
				773
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	774
				775	.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
				776
				777	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
				778	consumed is not NULL, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
				779	trailing lead byte and the number of bytes that have been decoded will be stored
				780	in consumed.
				781
				782	.. versionadded:: 2.5
				783
				784
				785	.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
				786
				787	Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
				788	Python string object. Return NULL if an exception was raised by the codec.
				789
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	790	.. versionchanged:: 2.5
				791	This function used an :ctype:`int` type for size. This might require
				792	changes in your code for properly supporting 64-bit systems.
				793
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	794
				795	.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				796
				797	Encode a Unicode object using MBCS and return the result as Python string
				798	object. Error handling is "strict". Return NULL if an exception was raised
				799	by the codec.
				800
				801	.. % --- Methods & Slots ----------------------------------------------------
				802
				803
				804	.. _unicodemethodsandslots:
				805
				806	Methods and Slot Functions
				807	^^^^^^^^^^^^^^^^^^^^^^^^^^
				808
				809	The following APIs are capable of handling Unicode objects and strings on input
				810	(we refer to them as strings in the descriptions) and return Unicode objects or
				811	integers as appropriate.
				812
				813	They all return NULL or ``-1`` if an exception occurs.
				814
				815
				816	.. cfunction:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
				817
				818	Concat two strings giving a new Unicode string.
				819
				820
				821	.. cfunction:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
				822
				823	Split a string giving a list of Unicode strings. If sep is NULL, splitting
				824	will be done at all whitespace substrings. Otherwise, splits occur at the given
				825	separator. At most maxsplit splits will be done. If negative, no limit is
				826	set. Separators are not included in the resulting list.
				827
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	828	.. versionchanged:: 2.5
				829	This function used an :ctype:`int` type for maxsplit. This might require
				830	changes in your code for properly supporting 64-bit systems.
				831
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	832
				833	.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
				834
				835	Split a Unicode string at line breaks, returning a list of Unicode strings.
				836	CRLF is considered to be one line break. If keepend is 0, the Line break
				837	characters are not included in the resulting strings.
				838
				839
				840	.. cfunction:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
				841
				842	Translate a string by applying a character mapping table to it and return the
				843	resulting Unicode object.
				844
				845	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				846	or None (causing deletion of the character).
				847
				848	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				849	and sequences work well. Unmapped character ordinals (ones which cause a
				850	:exc:`LookupError`) are left untouched and are copied as-is.
				851
				852	errors has the usual meaning for codecs. It may be NULL which indicates to
				853	use the default error handling.
				854
				855
				856	.. cfunction:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
				857
				858	Join a sequence of strings using the given separator and return the resulting
				859	Unicode string.
				860
				861
				862	.. cfunction:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				863
				864	Return 1 if substr matches str[start:end] at the given tail end
				865	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				866	0 otherwise. Return ``-1`` if an error occurred.
				867
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	868	.. versionchanged:: 2.5
				869	This function used an :ctype:`int` type for start and end. This
				870	might require changes in your code for properly supporting 64-bit
				871	systems.
				872
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	873
				874	.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				875
				876	Return the first position of substr in str[start:end] using the given
				877	direction (direction == 1 means to do a forward search, direction == -1 a
				878	backward search). The return value is the index of the first match; a value of
				879	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				880	occurred and an exception has been set.
				881
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	882	.. versionchanged:: 2.5
				883	This function used an :ctype:`int` type for start and end. This
				884	might require changes in your code for properly supporting 64-bit
				885	systems.
				886
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	887
				888	.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
				889
				890	Return the number of non-overlapping occurrences of substr in
				891	``str[start:end]``. Return ``-1`` if an error occurred.
				892
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	893	.. versionchanged:: 2.5
				894	This function returned an :ctype:`int` type and used an :ctype:`int`
				895	type for start and end. This might require changes in your code for
				896	properly supporting 64-bit systems.
				897
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	898
				899	.. cfunction:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
				900
				901	Replace at most maxcount occurrences of substr in str with replstr and
				902	return the resulting Unicode object. maxcount == -1 means replace all
				903	occurrences.
				904
Jeroen Ruigrok van der Werven	0051bf3	2009-04-29 08:00:05 +0000	[diff] [blame]	905	.. versionchanged:: 2.5
				906	This function used an :ctype:`int` type for maxcount. This might
				907	require changes in your code for properly supporting 64-bit systems.
				908
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	909
				910	.. cfunction:: int PyUnicode_Compare(PyObject left, PyObject right)
				911
				912	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				913	respectively.
				914
				915
				916	.. cfunction:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
				917
				918	Rich compare two unicode strings and return one of the following:
				919
				920	* ``NULL`` in case an exception was raised
				921	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				922	* :const:`Py_NotImplemented` in case the type combination is unknown
				923
				924	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				925	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				926	with a :exc:`UnicodeDecodeError`.
				927
				928	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				929	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				930
				931
				932	.. cfunction:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
				933
				934	Return a new string object from format and args; this is analogous to
				935	``format % args``. The args argument must be a tuple.
				936
				937
				938	.. cfunction:: int PyUnicode_Contains(PyObject container, PyObject element)
				939
				940	Check whether element is contained in container and return true or false
				941	accordingly.
				942
				943	element has to coerce to a one element Unicode string. ``-1`` is returned if
				944	there was an error.