Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 13a28075bb7061f8b2b190e35d23d7af27ed0120 [file] [log] [blame]

Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	14	Unicode Type
				15	""""""""""""
				16
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	17	These are the basic Unicode object types used for the Unicode implementation in
				18	Python:
				19
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	20
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	21	.. c:type:: Py_UNICODE
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	22
				23	This type represents the storage type which is used by Python internally as
				24	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	25	for :c:type:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	26	possible to build a UCS4 version of Python (most recent Linux distributions come
				27	with UCS4 builds of Python). These builds then use a 32-bit type for
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	28	:c:type:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				29	where :c:type:`wchar_t` is available and compatible with the chosen Python
				30	Unicode build variant, :c:type:`Py_UNICODE` is a typedef alias for
				31	:c:type:`wchar_t` to enhance native platform compatibility. On all other
				32	platforms, :c:type:`Py_UNICODE` is a typedef alias for either :c:type:`unsigned
				33	short` (UCS2) or :c:type:`unsigned long` (UCS4).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	34
				35	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				36	this in mind when writing extensions or interfaces.
				37
				38
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	39	.. c:type:: PyUnicodeObject
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	40
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	41	This subtype of :c:type:`PyObject` represents a Python Unicode object.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	42
				43
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	44	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	45
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	46	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	47	is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
				48
				49	The following APIs are really C macros and can be used to do fast checks and to
				50	access internal read-only data of Unicode objects:
				51
				52
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	53	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	54
				55	Return true if the object o is a Unicode object or an instance of a Unicode
				56	subtype.
				57
				58	.. versionchanged:: 2.2
				59	Allowed subtypes to be accepted.
				60
				61
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	62	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	63
				64	Return true if the object o is a Unicode object, but not an instance of a
				65	subtype.
				66
				67	.. versionadded:: 2.2
				68
				69
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	70	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	71
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	72	Return the size of the object. o has to be a :c:type:`PyUnicodeObject` (not
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	73	checked).
				74
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	75	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	76	This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	77	in your code for properly supporting 64-bit systems.
				78
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	79
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	80	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	81
				82	Return the size of the object's internal buffer in bytes. o has to be a
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	83	:c:type:`PyUnicodeObject` (not checked).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	84
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	85	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	86	This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	87	in your code for properly supporting 64-bit systems.
				88
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	89
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	90	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	91
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	92	Return a pointer to the internal :c:type:`Py_UNICODE` buffer of the object. o
				93	has to be a :c:type:`PyUnicodeObject` (not checked).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	94
				95
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	96	.. c:function:: const char* PyUnicode_AS_DATA(PyObject *o)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	97
				98	Return a pointer to the internal buffer of the object. o has to be a
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	99	:c:type:`PyUnicodeObject` (not checked).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	100
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	101
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	102	.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	103
				104	Clear the free list. Return the total number of freed items.
				105
				106	.. versionadded:: 2.6
				107
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	108
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	109	Unicode Character Properties
				110	""""""""""""""""""""""""""""
				111
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	112	Unicode provides many different character properties. The most often needed ones
				113	are available through these macros which are mapped to C functions depending on
				114	the Python configuration.
				115
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	116
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	117	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	118
				119	Return 1 or 0 depending on whether ch is a whitespace character.
				120
				121
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	122	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	123
				124	Return 1 or 0 depending on whether ch is a lowercase character.
				125
				126
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	127	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	128
				129	Return 1 or 0 depending on whether ch is an uppercase character.
				130
				131
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	132	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	133
				134	Return 1 or 0 depending on whether ch is a titlecase character.
				135
				136
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	137	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	138
				139	Return 1 or 0 depending on whether ch is a linebreak character.
				140
				141
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	142	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	143
				144	Return 1 or 0 depending on whether ch is a decimal character.
				145
				146
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	147	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	148
				149	Return 1 or 0 depending on whether ch is a digit character.
				150
				151
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	152	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	153
				154	Return 1 or 0 depending on whether ch is a numeric character.
				155
				156
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	157	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	158
				159	Return 1 or 0 depending on whether ch is an alphabetic character.
				160
				161
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	162	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	163
				164	Return 1 or 0 depending on whether ch is an alphanumeric character.
				165
				166	These APIs can be used for fast direct character conversions:
				167
				168
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	169	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	170
				171	Return the character ch converted to lower case.
				172
				173
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	174	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	175
				176	Return the character ch converted to upper case.
				177
				178
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	179	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	180
				181	Return the character ch converted to title case.
				182
				183
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	184	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	185
				186	Return the character ch converted to a decimal positive integer. Return
				187	``-1`` if this is not possible. This macro does not raise exceptions.
				188
				189
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	190	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	191
				192	Return the character ch converted to a single digit integer. Return ``-1`` if
				193	this is not possible. This macro does not raise exceptions.
				194
				195
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	196	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	197
				198	Return the character ch converted to a double. Return ``-1.0`` if this is not
				199	possible. This macro does not raise exceptions.
				200
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	201
				202	Plain Py_UNICODE
				203	""""""""""""""""
				204
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	205	To create Unicode objects and access their basic sequence properties, use these
				206	APIs:
				207
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	208
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	209	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	210
Georg Brandl	b8d0e36	2010-11-26 07:53:50 +0000	[diff] [blame]	211	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	212	may be NULL which causes the contents to be undefined. It is the user's
				213	responsibility to fill in the needed data. The buffer is copied into the new
				214	object. If the buffer is not NULL, the return value might be a shared object.
				215	Therefore, modification of the resulting Unicode object is only allowed when u
				216	is NULL.
				217
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	218	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	219	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	220	changes in your code for properly supporting 64-bit systems.
				221
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	222
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	223	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	224
Georg Brandl	b8d0e36	2010-11-26 07:53:50 +0000	[diff] [blame]	225	Create a Unicode object from the char buffer u. The bytes will be interpreted
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	226	as being UTF-8 encoded. u may also be NULL which
				227	causes the contents to be undefined. It is the user's responsibility to fill in
				228	the needed data. The buffer is copied into the new object. If the buffer is not
				229	NULL, the return value might be a shared object. Therefore, modification of
				230	the resulting Unicode object is only allowed when u is NULL.
				231
				232	.. versionadded:: 2.6
				233
				234
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	235	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	236
				237	Create a Unicode object from an UTF-8 encoded null-terminated char buffer
				238	u.
				239
				240	.. versionadded:: 2.6
				241
				242
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	243	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	244
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	245	Take a C :c:func:`printf`\ -style format string and a variable number of
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	246	arguments, calculate the size of the resulting Python unicode string and return
				247	a string with the values formatted into it. The variable arguments must be C
				248	types and must correspond exactly to the format characters in the format
				249	string. The following format characters are allowed:
				250
				251	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				252	.. % because not all compilers support the %z width modifier -- we fake it
				253	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
				254
Georg Brandl	44ea77b	2013-03-28 13:28:44 +0100	[diff] [blame]	255	.. tabularcolumns:: \|l\|l\|L\|
				256
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	257	+-------------------+---------------------+--------------------------------+
				258	\| Format Characters \| Type \| Comment \|
				259	+===================+=====================+================================+
				260	\| :attr:`%%` \| n/a \| The literal % character. \|
				261	+-------------------+---------------------+--------------------------------+
				262	\| :attr:`%c` \| int \| A single character, \|
				263	\| \| \| represented as an C int. \|
				264	+-------------------+---------------------+--------------------------------+
				265	\| :attr:`%d` \| int \| Exactly equivalent to \|
				266	\| \| \| ``printf("%d")``. \|
				267	+-------------------+---------------------+--------------------------------+
				268	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				269	\| \| \| ``printf("%u")``. \|
				270	+-------------------+---------------------+--------------------------------+
				271	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				272	\| \| \| ``printf("%ld")``. \|
				273	+-------------------+---------------------+--------------------------------+
				274	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				275	\| \| \| ``printf("%lu")``. \|
				276	+-------------------+---------------------+--------------------------------+
				277	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				278	\| \| \| ``printf("%zd")``. \|
				279	+-------------------+---------------------+--------------------------------+
				280	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				281	\| \| \| ``printf("%zu")``. \|
				282	+-------------------+---------------------+--------------------------------+
				283	\| :attr:`%i` \| int \| Exactly equivalent to \|
				284	\| \| \| ``printf("%i")``. \|
				285	+-------------------+---------------------+--------------------------------+
				286	\| :attr:`%x` \| int \| Exactly equivalent to \|
				287	\| \| \| ``printf("%x")``. \|
				288	+-------------------+---------------------+--------------------------------+
				289	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				290	\| \| \| array. \|
				291	+-------------------+---------------------+--------------------------------+
				292	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				293	\| \| \| pointer. Mostly equivalent to \|
				294	\| \| \| ``printf("%p")`` except that \|
				295	\| \| \| it is guaranteed to start with \|
				296	\| \| \| the literal ``0x`` regardless \|
				297	\| \| \| of what the platform's \|
				298	\| \| \| ``printf`` yields. \|
				299	+-------------------+---------------------+--------------------------------+
				300	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				301	+-------------------+---------------------+--------------------------------+
				302	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				303	\| \| \| NULL) and a null-terminated \|
				304	\| \| \| C character array as a second \|
				305	\| \| \| parameter (which will be used, \|
				306	\| \| \| if the first parameter is \|
				307	\| \| \| NULL). \|
				308	+-------------------+---------------------+--------------------------------+
				309	\| :attr:`%S` \| PyObject\* \| The result of calling \|
				310	\| \| \| :func:`PyObject_Unicode`. \|
				311	+-------------------+---------------------+--------------------------------+
				312	\| :attr:`%R` \| PyObject\* \| The result of calling \|
				313	\| \| \| :func:`PyObject_Repr`. \|
				314	+-------------------+---------------------+--------------------------------+
				315
				316	An unrecognized format character causes all the rest of the format string to be
				317	copied as-is to the result string, and any extra arguments discarded.
				318
				319	.. versionadded:: 2.6
				320
				321
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	322	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	323
				324	Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
				325	arguments.
				326
				327	.. versionadded:: 2.6
				328
				329
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	330	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	331
Victor Stinner	28a545e	2011-12-18 19:39:53 +0100	[diff] [blame]	332	Return a read-only pointer to the Unicode object's internal
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	333	:c:type:`Py_UNICODE` buffer, NULL if unicode is not a Unicode object.
				334	Note that the resulting :c:type:`Py_UNICODE*` string may contain embedded
Victor Stinner	28a545e	2011-12-18 19:39:53 +0100	[diff] [blame]	335	null characters, which would cause the string to be truncated when used in
				336	most C functions.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	337
				338
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	339	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	340
				341	Return the length of the Unicode object.
				342
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	343	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	344	This function returned an :c:type:`int` type. This might require changes
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	345	in your code for properly supporting 64-bit systems.
				346
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	347
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	348	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	349
				350	Coerce an encoded object obj to an Unicode object and return a reference with
				351	incremented refcount.
				352
				353	String and other char buffer compatible objects are decoded according to the
				354	given encoding and using the error handling defined by errors. Both can be
				355	NULL to have the interface use the default values (see the next section for
				356	details).
				357
				358	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				359	set.
				360
				361	The API returns NULL if there was an error. The caller is responsible for
				362	decref'ing the returned objects.
				363
				364
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	365	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	366
				367	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				368	throughout the interpreter whenever coercion to Unicode is needed.
				369
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	370	If the platform supports :c:type:`wchar_t` and provides a header file wchar.h,
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	371	Python can interface directly to this type using the following functions.
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	372	Support is optimized if Python's own :c:type:`Py_UNICODE` type is identical to
				373	the system's :c:type:`wchar_t`.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	374
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	375
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	376	wchar_t Support
				377	"""""""""""""""
				378
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	379	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	380
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	381	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	382
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	383	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	384	Return NULL on failure.
				385
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	386	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	387	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	388	changes in your code for properly supporting 64-bit systems.
				389
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	390
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	391	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	392
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	393	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				394	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
				395	0-termination character). Return the number of :c:type:`wchar_t` characters
				396	copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t`
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	397	string may or may not be 0-terminated. It is the responsibility of the caller
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	398	to make sure that the :c:type:`wchar_t` string is 0-terminated in case this is
				399	required by the application. Also, note that the :c:type:`wchar_t*` string
Victor Stinner	28a545e	2011-12-18 19:39:53 +0100	[diff] [blame]	400	might contain null characters, which would cause the string to be truncated
				401	when used with most C functions.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	402
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	403	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	404	This function returned an :c:type:`int` type and used an :c:type:`int`
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	405	type for size. This might require changes in your code for properly
				406	supporting 64-bit systems.
				407
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	408
				409	.. _builtincodecs:
				410
				411	Built-in Codecs
				412	^^^^^^^^^^^^^^^
				413
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	414	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	415	these codecs are directly usable via the following functions.
				416
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	417	Many of the following APIs take two arguments encoding and errors, and they
				418	have the same semantics as the ones of the built-in :func:`unicode` Unicode
				419	object constructor.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	420
				421	Setting encoding to NULL causes the default encoding to be used which is
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	422	ASCII. The file system calls should use :c:data:`Py_FileSystemDefaultEncoding`
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	423	as the encoding for file names. This variable should be treated as read-only: on
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	424	some systems, it will be a pointer to a static string, on others, it will change
				425	at run-time (such as when the application invokes setlocale).
				426
				427	Error handling is set by errors which may also be set to NULL meaning to use
				428	the default handling defined for the codec. Default error handling for all
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	429	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	430
				431	The codecs all use a similar interface. Only deviation from the following
				432	generic ones are documented for simplicity.
				433
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	434
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	435	Generic Codecs
				436	""""""""""""""
				437
				438	These are the generic codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	439
				440
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	441	.. c:function:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	442
				443	Create a Unicode object by decoding size bytes of the encoded string s.
				444	encoding and errors have the same meaning as the parameters of the same name
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	445	in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	446	using the Python codec registry. Return NULL if an exception was raised by
				447	the codec.
				448
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	449	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	450	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	451	changes in your code for properly supporting 64-bit systems.
				452
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	453
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	454	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	455
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	456	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	457	string object. encoding and errors have the same meaning as the parameters
Serhiy Storchaka	99a196f	2013-10-09 13:25:21 +0300	[diff] [blame]	458	of the same name in the Unicode :meth:`~unicode.encode` method. The codec
				459	to be used is looked up using the Python codec registry. Return NULL if
				460	an exception was raised by the codec.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	461
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	462	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	463	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	464	changes in your code for properly supporting 64-bit systems.
				465
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	466
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	467	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	468
				469	Encode a Unicode object and return the result as Python string object.
				470	encoding and errors have the same meaning as the parameters of the same name
				471	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				472	the Python codec registry. Return NULL if an exception was raised by the
				473	codec.
				474
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	475
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	476	UTF-8 Codecs
				477	""""""""""""
				478
				479	These are the UTF-8 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	480
				481
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	482	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	483
				484	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				485	s. Return NULL if an exception was raised by the codec.
				486
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	487	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	488	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	489	changes in your code for properly supporting 64-bit systems.
				490
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	491
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	492	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	493
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	494	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	495	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				496	treated as an error. Those bytes will not be decoded and the number of bytes
				497	that have been decoded will be stored in consumed.
				498
				499	.. versionadded:: 2.4
				500
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	501	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	502	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	503	changes in your code for properly supporting 64-bit systems.
				504
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	505
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	506	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	507
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	508	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and return a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	509	Python string object. Return NULL if an exception was raised by the codec.
				510
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	511	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	512	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	513	changes in your code for properly supporting 64-bit systems.
				514
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	515
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	516	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	517
				518	Encode a Unicode object using UTF-8 and return the result as Python string
				519	object. Error handling is "strict". Return NULL if an exception was raised
				520	by the codec.
				521
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	522
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	523	UTF-32 Codecs
				524	"""""""""""""
				525
				526	These are the UTF-32 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	527
				528
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	529	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	530
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	531	Decode size bytes from a UTF-32 encoded buffer string and return the
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	532	corresponding Unicode object. errors (if non-NULL) defines the error
				533	handling. It defaults to "strict".
				534
				535	If byteorder is non-NULL, the decoder starts decoding using the given byte
				536	order::
				537
				538	*byteorder == -1: little endian
				539	*byteorder == 0: native order
				540	*byteorder == 1: big endian
				541
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	542	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				543	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				544	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				545	``1``, any byte order mark is copied to the output.
				546
				547	After completion, \byteorder* is set to the current byte order at the end
				548	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	549
Georg Brandl	a44ec3f	2015-01-14 08:26:30 +0100	[diff] [blame]	550	In a narrow build code points outside the BMP will be decoded as surrogate pairs.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	551
				552	If byteorder is NULL, the codec starts in native order mode.
				553
				554	Return NULL if an exception was raised by the codec.
				555
				556	.. versionadded:: 2.6
				557
				558
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	559	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	560
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	561	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				562	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	563	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				564	by four) as an error. Those bytes will not be decoded and the number of bytes
				565	that have been decoded will be stored in consumed.
				566
				567	.. versionadded:: 2.6
				568
				569
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	570	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	571
				572	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	573	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	574
				575	byteorder == -1: little endian
				576	byteorder == 0: native byte order (writes a BOM mark)
				577	byteorder == 1: big endian
				578
				579	If byteorder is ``0``, the output string will always start with the Unicode BOM
				580	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				581
				582	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
Georg Brandl	a44ec3f	2015-01-14 08:26:30 +0100	[diff] [blame]	583	as a single code point.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	584
				585	Return NULL if an exception was raised by the codec.
				586
				587	.. versionadded:: 2.6
				588
				589
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	590	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	591
				592	Return a Python string using the UTF-32 encoding in native byte order. The
				593	string always starts with a BOM mark. Error handling is "strict". Return
				594	NULL if an exception was raised by the codec.
				595
				596	.. versionadded:: 2.6
				597
				598
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	599	UTF-16 Codecs
				600	"""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	601
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	602	These are the UTF-16 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	603
				604
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	605	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	606
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	607	Decode size bytes from a UTF-16 encoded buffer string and return the
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	608	corresponding Unicode object. errors (if non-NULL) defines the error
				609	handling. It defaults to "strict".
				610
				611	If byteorder is non-NULL, the decoder starts decoding using the given byte
				612	order::
				613
				614	*byteorder == -1: little endian
				615	*byteorder == 0: native order
				616	*byteorder == 1: big endian
				617
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	618	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				619	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				620	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				621	``1``, any byte order mark is copied to the output (where it will result in
				622	either a ``\ufeff`` or a ``\ufffe`` character).
				623
				624	After completion, \byteorder* is set to the current byte order at the end
				625	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	626
				627	If byteorder is NULL, the codec starts in native order mode.
				628
				629	Return NULL if an exception was raised by the codec.
				630
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	631	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	632	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	633	changes in your code for properly supporting 64-bit systems.
				634
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	635
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	636	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	637
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	638	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				639	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	640	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				641	split surrogate pair) as an error. Those bytes will not be decoded and the
				642	number of bytes that have been decoded will be stored in consumed.
				643
				644	.. versionadded:: 2.4
				645
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	646	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	647	This function used an :c:type:`int` type for size and an :c:type:`int *`
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	648	type for consumed. This might require changes in your code for
				649	properly supporting 64-bit systems.
				650
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	651
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	652	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	653
				654	Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	655	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	656
				657	byteorder == -1: little endian
				658	byteorder == 0: native byte order (writes a BOM mark)
				659	byteorder == 1: big endian
				660
				661	If byteorder is ``0``, the output string will always start with the Unicode BOM
				662	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				663
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	664	If Py_UNICODE_WIDE is defined, a single :c:type:`Py_UNICODE` value may get
				665	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	666	values is interpreted as an UCS-2 character.
				667
				668	Return NULL if an exception was raised by the codec.
				669
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	670	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	671	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	672	changes in your code for properly supporting 64-bit systems.
				673
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	674
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	675	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	676
				677	Return a Python string using the UTF-16 encoding in native byte order. The
				678	string always starts with a BOM mark. Error handling is "strict". Return
				679	NULL if an exception was raised by the codec.
				680
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	681
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	682	UTF-7 Codecs
				683	""""""""""""
				684
				685	These are the UTF-7 codec APIs:
				686
				687
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	688	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	689
				690	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				691	s. Return NULL if an exception was raised by the codec.
				692
				693
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	694	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	695
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	696	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	697	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				698	be treated as an error. Those bytes will not be decoded and the number of
				699	bytes that have been decoded will be stored in consumed.
				700
				701
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	702	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char errors)
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	703
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	704	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	705	return a Python bytes object. Return NULL if an exception was raised by
				706	the codec.
				707
				708	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				709	special meaning) will be encoded in base-64. If base64WhiteSpace is
				710	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				711	Python "utf-7" codec.
				712
				713
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	714	Unicode-Escape Codecs
				715	"""""""""""""""""""""
				716
				717	These are the "Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	718
				719
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	720	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	721
				722	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				723	string s. Return NULL if an exception was raised by the codec.
				724
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	725	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	726	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	727	changes in your code for properly supporting 64-bit systems.
				728
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	729
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	730	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	731
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	732	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	733	return a Python string object. Return NULL if an exception was raised by the
				734	codec.
				735
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	736	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	737	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	738	changes in your code for properly supporting 64-bit systems.
				739
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	740
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	741	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	742
				743	Encode a Unicode object using Unicode-Escape and return the result as Python
				744	string object. Error handling is "strict". Return NULL if an exception was
				745	raised by the codec.
				746
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	747
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	748	Raw-Unicode-Escape Codecs
				749	"""""""""""""""""""""""""
				750
				751	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	752
				753
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	754	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	755
				756	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				757	encoded string s. Return NULL if an exception was raised by the codec.
				758
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	759	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	760	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	761	changes in your code for properly supporting 64-bit systems.
				762
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	763
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	764	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	765
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	766	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	767	and return a Python string object. Return NULL if an exception was raised by
				768	the codec.
				769
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	770	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	771	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	772	changes in your code for properly supporting 64-bit systems.
				773
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	774
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	775	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	776
				777	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				778	Python string object. Error handling is "strict". Return NULL if an exception
				779	was raised by the codec.
				780
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	781
				782	Latin-1 Codecs
				783	""""""""""""""
				784
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	785	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				786	ordinals and only these are accepted by the codecs during encoding.
				787
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	788
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	789	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	790
				791	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				792	s. Return NULL if an exception was raised by the codec.
				793
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	794	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	795	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	796	changes in your code for properly supporting 64-bit systems.
				797
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	798
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	799	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	800
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	801	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and return
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	802	a Python string object. Return NULL if an exception was raised by the codec.
				803
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	804	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	805	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	806	changes in your code for properly supporting 64-bit systems.
				807
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	808
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	809	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	810
				811	Encode a Unicode object using Latin-1 and return the result as Python string
				812	object. Error handling is "strict". Return NULL if an exception was raised
				813	by the codec.
				814
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	815
				816	ASCII Codecs
				817	""""""""""""
				818
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	819	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				820	codes generate errors.
				821
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	822
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	823	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	824
				825	Create a Unicode object by decoding size bytes of the ASCII encoded string
				826	s. Return NULL if an exception was raised by the codec.
				827
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	828	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	829	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	830	changes in your code for properly supporting 64-bit systems.
				831
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	832
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	833	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	834
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	835	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and return a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	836	Python string object. Return NULL if an exception was raised by the codec.
				837
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	838	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	839	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	840	changes in your code for properly supporting 64-bit systems.
				841
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	842
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	843	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	844
				845	Encode a Unicode object using ASCII and return the result as Python string
				846	object. Error handling is "strict". Return NULL if an exception was raised
				847	by the codec.
				848
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	849
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	850	Character Map Codecs
				851	""""""""""""""""""""
				852
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	853	This codec is special in that it can be used to implement many different codecs
				854	(and this is in fact what was done to obtain most of the standard codecs
				855	included in the :mod:`encodings` package). The codec uses mapping to encode and
				856	decode characters.
				857
				858	Decoding mappings must map single string characters to single Unicode
				859	characters, integers (which are then interpreted as Unicode ordinals) or None
				860	(meaning "undefined mapping" and causing an error).
				861
				862	Encoding mappings must map single Unicode characters to single string
				863	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				864	(meaning "undefined mapping" and causing an error).
				865
				866	The mapping objects provided must only support the __getitem__ mapping
				867	interface.
				868
				869	If a character lookup fails with a LookupError, the character is copied as-is
				870	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				871	resp. Because of this, mappings only need to contain those mappings which map
				872	characters to different code points.
				873
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	874	These are the mapping codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	875
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	876	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	877
				878	Create a Unicode object by decoding size bytes of the encoded string s using
				879	the given mapping object. Return NULL if an exception was raised by the
				880	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				881	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				882	Byte values greater that the length of the string and U+FFFE "characters" are
				883	treated as "undefined mapping".
				884
				885	.. versionchanged:: 2.4
				886	Allowed unicode string as mapping argument.
				887
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	888	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	889	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	890	changes in your code for properly supporting 64-bit systems.
				891
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	892
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	893	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	894
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	895	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	896	mapping object and return a Python string object. Return NULL if an
				897	exception was raised by the codec.
				898
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	899	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	900	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	901	changes in your code for properly supporting 64-bit systems.
				902
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	903
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	904	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	905
				906	Encode a Unicode object using the given mapping object and return the result
				907	as Python string object. Error handling is "strict". Return NULL if an
				908	exception was raised by the codec.
				909
				910	The following codec API is special in that maps Unicode to Unicode.
				911
				912
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	913	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	914
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	915	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	916	character mapping table to it and return the resulting Unicode object. Return
				917	NULL when an exception was raised by the codec.
				918
				919	The mapping table must map Unicode ordinal integers to Unicode ordinal
				920	integers or None (causing deletion of the character).
				921
				922	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				923	and sequences work well. Unmapped character ordinals (ones which cause a
				924	:exc:`LookupError`) are left untouched and are copied as-is.
				925
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	926	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	927	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	928	changes in your code for properly supporting 64-bit systems.
				929
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	930
				931	MBCS codecs for Windows
				932	"""""""""""""""""""""""
				933
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	934	These are the MBCS codec APIs. They are currently only available on Windows and
				935	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				936	DBCS) is a class of encodings, not just one. The target encoding is defined by
				937	the user settings on the machine running the codec.
				938
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	939
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	940	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	941
				942	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				943	Return NULL if an exception was raised by the codec.
				944
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	945	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	946	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	947	changes in your code for properly supporting 64-bit systems.
				948
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	949
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	950	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	951
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	952	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				953	consumed is not NULL, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	954	trailing lead byte and the number of bytes that have been decoded will be stored
				955	in consumed.
				956
				957	.. versionadded:: 2.5
				958
				959
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	960	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	961
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	962	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	963	Python string object. Return NULL if an exception was raised by the codec.
				964
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	965	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	966	This function used an :c:type:`int` type for size. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	967	changes in your code for properly supporting 64-bit systems.
				968
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	969
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	970	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	971
				972	Encode a Unicode object using MBCS and return the result as Python string
				973	object. Error handling is "strict". Return NULL if an exception was raised
				974	by the codec.
				975
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	976
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	977	Methods & Slots
				978	"""""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	979
				980	.. _unicodemethodsandslots:
				981
				982	Methods and Slot Functions
				983	^^^^^^^^^^^^^^^^^^^^^^^^^^
				984
				985	The following APIs are capable of handling Unicode objects and strings on input
				986	(we refer to them as strings in the descriptions) and return Unicode objects or
				987	integers as appropriate.
				988
				989	They all return NULL or ``-1`` if an exception occurs.
				990
				991
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	992	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	993
				994	Concat two strings giving a new Unicode string.
				995
				996
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	997	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	998
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	999	Split a string giving a list of Unicode strings. If sep is NULL, splitting
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1000	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1001	separator. At most maxsplit splits will be done. If negative, no limit is
				1002	set. Separators are not included in the resulting list.
				1003
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1004	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1005	This function used an :c:type:`int` type for maxsplit. This might require
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1006	changes in your code for properly supporting 64-bit systems.
				1007
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1008
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1009	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1010
				1011	Split a Unicode string at line breaks, returning a list of Unicode strings.
				1012	CRLF is considered to be one line break. If keepend is 0, the Line break
				1013	characters are not included in the resulting strings.
				1014
				1015
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1016	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1017
				1018	Translate a string by applying a character mapping table to it and return the
				1019	resulting Unicode object.
				1020
				1021	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1022	or None (causing deletion of the character).
				1023
				1024	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1025	and sequences work well. Unmapped character ordinals (ones which cause a
				1026	:exc:`LookupError`) are left untouched and are copied as-is.
				1027
				1028	errors has the usual meaning for codecs. It may be NULL which indicates to
				1029	use the default error handling.
				1030
				1031
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1032	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1033
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	1034	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1035	Unicode string.
				1036
				1037
Victor Stinner	a6066ce	2014-10-09 11:14:04 +0200	[diff] [blame]	1038	.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1039
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	1040	Return 1 if substr matches ``str[start:end]`` at the given tail end
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1041	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				1042	0 otherwise. Return ``-1`` if an error occurred.
				1043
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1044	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1045	This function used an :c:type:`int` type for start and end. This
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1046	might require changes in your code for properly supporting 64-bit
				1047	systems.
				1048
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1049
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1050	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1051
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	1052	Return the first position of substr in ``str[start:end]`` using the given
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1053	direction (direction == 1 means to do a forward search, direction == -1 a
				1054	backward search). The return value is the index of the first match; a value of
				1055	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1056	occurred and an exception has been set.
				1057
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1058	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1059	This function used an :c:type:`int` type for start and end. This
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1060	might require changes in your code for properly supporting 64-bit
				1061	systems.
				1062
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1063
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1064	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1065
				1066	Return the number of non-overlapping occurrences of substr in
				1067	``str[start:end]``. Return ``-1`` if an error occurred.
				1068
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1069	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1070	This function returned an :c:type:`int` type and used an :c:type:`int`
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1071	type for start and end. This might require changes in your code for
				1072	properly supporting 64-bit systems.
				1073
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1074
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1075	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1076
				1077	Replace at most maxcount occurrences of substr in str with replstr and
				1078	return the resulting Unicode object. maxcount == -1 means replace all
				1079	occurrences.
				1080
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1081	.. versionchanged:: 2.5
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1082	This function used an :c:type:`int` type for maxcount. This might
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1083	require changes in your code for properly supporting 64-bit systems.
				1084
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1085
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1086	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1087
				1088	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				1089	respectively.
				1090
				1091
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1092	.. c:function:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1093
				1094	Rich compare two unicode strings and return one of the following:
				1095
				1096	* ``NULL`` in case an exception was raised
				1097	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1098	* :const:`Py_NotImplemented` in case the type combination is unknown
				1099
				1100	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				1101	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				1102	with a :exc:`UnicodeDecodeError`.
				1103
				1104	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1105	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1106
				1107
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1108	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1109
				1110	Return a new string object from format and args; this is analogous to
Benjamin Peterson	8f25762	2014-07-19 16:34:33 -0700	[diff] [blame]	1111	``format % args``.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1112
				1113
Sandro Tosi	98ed08f	2012-01-14 16:42:02 +0100	[diff] [blame]	1114	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1115
				1116	Check whether element is contained in container and return true or false
				1117	accordingly.
				1118
				1119	element has to coerce to a one element Unicode string. ``-1`` is returned if
				1120	there was an error.