Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 001192c948c6a0bee1a57a44c34dea11cfc9fc7d [file] [log] [blame]

Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	14	Unicode Type
				15	""""""""""""
				16
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	17	These are the basic Unicode object types used for the Unicode implementation in
				18	Python:
				19
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	20
				21	.. ctype:: Py_UNICODE
				22
				23	This type represents the storage type which is used by Python internally as
				24	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
				25	for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
				26	possible to build a UCS4 version of Python (most recent Linux distributions come
				27	with UCS4 builds of Python). These builds then use a 32-bit type for
				28	:ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				29	where :ctype:`wchar_t` is available and compatible with the chosen Python
				30	Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
				31	:ctype:`wchar_t` to enhance native platform compatibility. On all other
				32	platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
				33	short` (UCS2) or :ctype:`unsigned long` (UCS4).
				34
				35	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				36	this in mind when writing extensions or interfaces.
				37
				38
				39	.. ctype:: PyUnicodeObject
				40
				41	This subtype of :ctype:`PyObject` represents a Python Unicode object.
				42
				43
				44	.. cvar:: PyTypeObject PyUnicode_Type
				45
				46	This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
				47	is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
				48
				49	The following APIs are really C macros and can be used to do fast checks and to
				50	access internal read-only data of Unicode objects:
				51
				52
				53	.. cfunction:: int PyUnicode_Check(PyObject *o)
				54
				55	Return true if the object o is a Unicode object or an instance of a Unicode
				56	subtype.
				57
				58	.. versionchanged:: 2.2
				59	Allowed subtypes to be accepted.
				60
				61
				62	.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
				63
				64	Return true if the object o is a Unicode object, but not an instance of a
				65	subtype.
				66
				67	.. versionadded:: 2.2
				68
				69
				70	.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				71
				72	Return the size of the object. o has to be a :ctype:`PyUnicodeObject` (not
				73	checked).
				74
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	75	.. versionchanged:: 2.5
				76	This function returned an :ctype:`int` type. This might require changes
				77	in your code for properly supporting 64-bit systems.
				78
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	79
				80	.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				81
				82	Return the size of the object's internal buffer in bytes. o has to be a
				83	:ctype:`PyUnicodeObject` (not checked).
				84
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	85	.. versionchanged:: 2.5
				86	This function returned an :ctype:`int` type. This might require changes
				87	in your code for properly supporting 64-bit systems.
				88
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	89
				90	.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				91
				92	Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. o
				93	has to be a :ctype:`PyUnicodeObject` (not checked).
				94
				95
				96	.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
				97
				98	Return a pointer to the internal buffer of the object. o has to be a
				99	:ctype:`PyUnicodeObject` (not checked).
				100
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	101
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	102	.. cfunction:: int PyUnicode_ClearFreeList()
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	103
				104	Clear the free list. Return the total number of freed items.
				105
				106	.. versionadded:: 2.6
				107
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	108
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	109	Unicode Character Properties
				110	""""""""""""""""""""""""""""
				111
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	112	Unicode provides many different character properties. The most often needed ones
				113	are available through these macros which are mapped to C functions depending on
				114	the Python configuration.
				115
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	116
				117	.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
				118
				119	Return 1 or 0 depending on whether ch is a whitespace character.
				120
				121
				122	.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
				123
				124	Return 1 or 0 depending on whether ch is a lowercase character.
				125
				126
				127	.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
				128
				129	Return 1 or 0 depending on whether ch is an uppercase character.
				130
				131
				132	.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
				133
				134	Return 1 or 0 depending on whether ch is a titlecase character.
				135
				136
				137	.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
				138
				139	Return 1 or 0 depending on whether ch is a linebreak character.
				140
				141
				142	.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
				143
				144	Return 1 or 0 depending on whether ch is a decimal character.
				145
				146
				147	.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
				148
				149	Return 1 or 0 depending on whether ch is a digit character.
				150
				151
				152	.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
				153
				154	Return 1 or 0 depending on whether ch is a numeric character.
				155
				156
				157	.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
				158
				159	Return 1 or 0 depending on whether ch is an alphabetic character.
				160
				161
				162	.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
				163
				164	Return 1 or 0 depending on whether ch is an alphanumeric character.
				165
				166	These APIs can be used for fast direct character conversions:
				167
				168
				169	.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
				170
				171	Return the character ch converted to lower case.
				172
				173
				174	.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
				175
				176	Return the character ch converted to upper case.
				177
				178
				179	.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
				180
				181	Return the character ch converted to title case.
				182
				183
				184	.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
				185
				186	Return the character ch converted to a decimal positive integer. Return
				187	``-1`` if this is not possible. This macro does not raise exceptions.
				188
				189
				190	.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
				191
				192	Return the character ch converted to a single digit integer. Return ``-1`` if
				193	this is not possible. This macro does not raise exceptions.
				194
				195
				196	.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
				197
				198	Return the character ch converted to a double. Return ``-1.0`` if this is not
				199	possible. This macro does not raise exceptions.
				200
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	201
				202	Plain Py_UNICODE
				203	""""""""""""""""
				204
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	205	To create Unicode objects and access their basic sequence properties, use these
				206	APIs:
				207
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	208
				209	.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				210
Georg Brandl	b8d0e36	2010-11-26 07:53:50 +0000	[diff] [blame]	211	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	212	may be NULL which causes the contents to be undefined. It is the user's
				213	responsibility to fill in the needed data. The buffer is copied into the new
				214	object. If the buffer is not NULL, the return value might be a shared object.
				215	Therefore, modification of the resulting Unicode object is only allowed when u
				216	is NULL.
				217
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	218	.. versionchanged:: 2.5
				219	This function used an :ctype:`int` type for size. This might require
				220	changes in your code for properly supporting 64-bit systems.
				221
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	222
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	223	.. cfunction:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
				224
Georg Brandl	b8d0e36	2010-11-26 07:53:50 +0000	[diff] [blame]	225	Create a Unicode object from the char buffer u. The bytes will be interpreted
Georg Brandl	79cdff0	2010-10-17 10:54:57 +0000	[diff] [blame]	226	as being UTF-8 encoded. u may also be NULL which
				227	causes the contents to be undefined. It is the user's responsibility to fill in
				228	the needed data. The buffer is copied into the new object. If the buffer is not
				229	NULL, the return value might be a shared object. Therefore, modification of
				230	the resulting Unicode object is only allowed when u is NULL.
				231
				232	.. versionadded:: 2.6
				233
				234
				235	.. cfunction:: PyObject PyUnicode_FromString(const char u)
				236
				237	Create a Unicode object from an UTF-8 encoded null-terminated char buffer
				238	u.
				239
				240	.. versionadded:: 2.6
				241
				242
				243	.. cfunction:: PyObject* PyUnicode_FromFormat(const char *format, ...)
				244
				245	Take a C :cfunc:`printf`\ -style format string and a variable number of
				246	arguments, calculate the size of the resulting Python unicode string and return
				247	a string with the values formatted into it. The variable arguments must be C
				248	types and must correspond exactly to the format characters in the format
				249	string. The following format characters are allowed:
				250
				251	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				252	.. % because not all compilers support the %z width modifier -- we fake it
				253	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
				254
				255	+-------------------+---------------------+--------------------------------+
				256	\| Format Characters \| Type \| Comment \|
				257	+===================+=====================+================================+
				258	\| :attr:`%%` \| n/a \| The literal % character. \|
				259	+-------------------+---------------------+--------------------------------+
				260	\| :attr:`%c` \| int \| A single character, \|
				261	\| \| \| represented as an C int. \|
				262	+-------------------+---------------------+--------------------------------+
				263	\| :attr:`%d` \| int \| Exactly equivalent to \|
				264	\| \| \| ``printf("%d")``. \|
				265	+-------------------+---------------------+--------------------------------+
				266	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				267	\| \| \| ``printf("%u")``. \|
				268	+-------------------+---------------------+--------------------------------+
				269	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				270	\| \| \| ``printf("%ld")``. \|
				271	+-------------------+---------------------+--------------------------------+
				272	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				273	\| \| \| ``printf("%lu")``. \|
				274	+-------------------+---------------------+--------------------------------+
				275	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				276	\| \| \| ``printf("%zd")``. \|
				277	+-------------------+---------------------+--------------------------------+
				278	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				279	\| \| \| ``printf("%zu")``. \|
				280	+-------------------+---------------------+--------------------------------+
				281	\| :attr:`%i` \| int \| Exactly equivalent to \|
				282	\| \| \| ``printf("%i")``. \|
				283	+-------------------+---------------------+--------------------------------+
				284	\| :attr:`%x` \| int \| Exactly equivalent to \|
				285	\| \| \| ``printf("%x")``. \|
				286	+-------------------+---------------------+--------------------------------+
				287	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				288	\| \| \| array. \|
				289	+-------------------+---------------------+--------------------------------+
				290	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				291	\| \| \| pointer. Mostly equivalent to \|
				292	\| \| \| ``printf("%p")`` except that \|
				293	\| \| \| it is guaranteed to start with \|
				294	\| \| \| the literal ``0x`` regardless \|
				295	\| \| \| of what the platform's \|
				296	\| \| \| ``printf`` yields. \|
				297	+-------------------+---------------------+--------------------------------+
				298	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				299	+-------------------+---------------------+--------------------------------+
				300	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				301	\| \| \| NULL) and a null-terminated \|
				302	\| \| \| C character array as a second \|
				303	\| \| \| parameter (which will be used, \|
				304	\| \| \| if the first parameter is \|
				305	\| \| \| NULL). \|
				306	+-------------------+---------------------+--------------------------------+
				307	\| :attr:`%S` \| PyObject\* \| The result of calling \|
				308	\| \| \| :func:`PyObject_Unicode`. \|
				309	+-------------------+---------------------+--------------------------------+
				310	\| :attr:`%R` \| PyObject\* \| The result of calling \|
				311	\| \| \| :func:`PyObject_Repr`. \|
				312	+-------------------+---------------------+--------------------------------+
				313
				314	An unrecognized format character causes all the rest of the format string to be
				315	copied as-is to the result string, and any extra arguments discarded.
				316
				317	.. versionadded:: 2.6
				318
				319
				320	.. cfunction:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
				321
				322	Identical to :func:`PyUnicode_FromFormat` except that it takes exactly two
				323	arguments.
				324
				325	.. versionadded:: 2.6
				326
				327
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	328	.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				329
				330	Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
				331	buffer, NULL if unicode is not a Unicode object.
				332
				333
				334	.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				335
				336	Return the length of the Unicode object.
				337
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	338	.. versionchanged:: 2.5
				339	This function returned an :ctype:`int` type. This might require changes
				340	in your code for properly supporting 64-bit systems.
				341
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	342
				343	.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
				344
				345	Coerce an encoded object obj to an Unicode object and return a reference with
				346	incremented refcount.
				347
				348	String and other char buffer compatible objects are decoded according to the
				349	given encoding and using the error handling defined by errors. Both can be
				350	NULL to have the interface use the default values (see the next section for
				351	details).
				352
				353	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				354	set.
				355
				356	The API returns NULL if there was an error. The caller is responsible for
				357	decref'ing the returned objects.
				358
				359
				360	.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
				361
				362	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				363	throughout the interpreter whenever coercion to Unicode is needed.
				364
				365	If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
				366	Python can interface directly to this type using the following functions.
				367	Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
				368	the system's :ctype:`wchar_t`.
				369
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	370
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	371	wchar_t Support
				372	"""""""""""""""
				373
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	374	:ctype:`wchar_t` support for platforms which support it:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	375
				376	.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
				377
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	378	Create a Unicode object from the :ctype:`wchar_t` buffer w of the given size.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	379	Return NULL on failure.
				380
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	381	.. versionchanged:: 2.5
				382	This function used an :ctype:`int` type for size. This might require
				383	changes in your code for properly supporting 64-bit systems.
				384
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	385
				386	.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
				387
				388	Copy the Unicode object contents into the :ctype:`wchar_t` buffer w. At most
				389	size :ctype:`wchar_t` characters are copied (excluding a possibly trailing
				390	0-termination character). Return the number of :ctype:`wchar_t` characters
				391	copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
				392	string may or may not be 0-terminated. It is the responsibility of the caller
				393	to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
				394	required by the application.
				395
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	396	.. versionchanged:: 2.5
				397	This function returned an :ctype:`int` type and used an :ctype:`int`
				398	type for size. This might require changes in your code for properly
				399	supporting 64-bit systems.
				400
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	401
				402	.. _builtincodecs:
				403
				404	Built-in Codecs
				405	^^^^^^^^^^^^^^^
				406
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	407	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	408	these codecs are directly usable via the following functions.
				409
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	410	Many of the following APIs take two arguments encoding and errors, and they
				411	have the same semantics as the ones of the built-in :func:`unicode` Unicode
				412	object constructor.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	413
				414	Setting encoding to NULL causes the default encoding to be used which is
				415	ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	416	as the encoding for file names. This variable should be treated as read-only: on
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	417	some systems, it will be a pointer to a static string, on others, it will change
				418	at run-time (such as when the application invokes setlocale).
				419
				420	Error handling is set by errors which may also be set to NULL meaning to use
				421	the default handling defined for the codec. Default error handling for all
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	422	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	423
				424	The codecs all use a similar interface. Only deviation from the following
				425	generic ones are documented for simplicity.
				426
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	427
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	428	Generic Codecs
				429	""""""""""""""
				430
				431	These are the generic codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	432
				433
				434	.. cfunction:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
				435
				436	Create a Unicode object by decoding size bytes of the encoded string s.
				437	encoding and errors have the same meaning as the parameters of the same name
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	438	in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	439	using the Python codec registry. Return NULL if an exception was raised by
				440	the codec.
				441
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	442	.. versionchanged:: 2.5
				443	This function used an :ctype:`int` type for size. This might require
				444	changes in your code for properly supporting 64-bit systems.
				445
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	446
				447	.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
				448
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	449	Encode the :ctype:`Py_UNICODE` buffer s of the given size and return a Python
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	450	string object. encoding and errors have the same meaning as the parameters
				451	of the same name in the Unicode :meth:`encode` method. The codec to be used is
				452	looked up using the Python codec registry. Return NULL if an exception was
				453	raised by the codec.
				454
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	455	.. versionchanged:: 2.5
				456	This function used an :ctype:`int` type for size. This might require
				457	changes in your code for properly supporting 64-bit systems.
				458
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	459
				460	.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
				461
				462	Encode a Unicode object and return the result as Python string object.
				463	encoding and errors have the same meaning as the parameters of the same name
				464	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				465	the Python codec registry. Return NULL if an exception was raised by the
				466	codec.
				467
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	468
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	469	UTF-8 Codecs
				470	""""""""""""
				471
				472	These are the UTF-8 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	473
				474
				475	.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
				476
				477	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				478	s. Return NULL if an exception was raised by the codec.
				479
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	480	.. versionchanged:: 2.5
				481	This function used an :ctype:`int` type for size. This might require
				482	changes in your code for properly supporting 64-bit systems.
				483
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	484
				485	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				486
				487	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
				488	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				489	treated as an error. Those bytes will not be decoded and the number of bytes
				490	that have been decoded will be stored in consumed.
				491
				492	.. versionadded:: 2.4
				493
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	494	.. versionchanged:: 2.5
				495	This function used an :ctype:`int` type for size. This might require
				496	changes in your code for properly supporting 64-bit systems.
				497
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	498
				499	.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				500
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	501	Encode the :ctype:`Py_UNICODE` buffer s of the given size using UTF-8 and return a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	502	Python string object. Return NULL if an exception was raised by the codec.
				503
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	504	.. versionchanged:: 2.5
				505	This function used an :ctype:`int` type for size. This might require
				506	changes in your code for properly supporting 64-bit systems.
				507
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	508
				509	.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
				510
				511	Encode a Unicode object using UTF-8 and return the result as Python string
				512	object. Error handling is "strict". Return NULL if an exception was raised
				513	by the codec.
				514
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	515
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	516	UTF-32 Codecs
				517	"""""""""""""
				518
				519	These are the UTF-32 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	520
				521
				522	.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
				523
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	524	Decode size bytes from a UTF-32 encoded buffer string and return the
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	525	corresponding Unicode object. errors (if non-NULL) defines the error
				526	handling. It defaults to "strict".
				527
				528	If byteorder is non-NULL, the decoder starts decoding using the given byte
				529	order::
				530
				531	*byteorder == -1: little endian
				532	*byteorder == 0: native order
				533	*byteorder == 1: big endian
				534
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	535	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				536	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				537	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				538	``1``, any byte order mark is copied to the output.
				539
				540	After completion, \byteorder* is set to the current byte order at the end
				541	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	542
				543	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				544
				545	If byteorder is NULL, the codec starts in native order mode.
				546
				547	Return NULL if an exception was raised by the codec.
				548
				549	.. versionadded:: 2.6
				550
				551
				552	.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				553
				554	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
				555	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
				556	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				557	by four) as an error. Those bytes will not be decoded and the number of bytes
				558	that have been decoded will be stored in consumed.
				559
				560	.. versionadded:: 2.6
				561
				562
				563	.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				564
				565	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	566	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	567
				568	byteorder == -1: little endian
				569	byteorder == 0: native byte order (writes a BOM mark)
				570	byteorder == 1: big endian
				571
				572	If byteorder is ``0``, the output string will always start with the Unicode BOM
				573	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				574
				575	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				576	as a single codepoint.
				577
				578	Return NULL if an exception was raised by the codec.
				579
				580	.. versionadded:: 2.6
				581
				582
				583	.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				584
				585	Return a Python string using the UTF-32 encoding in native byte order. The
				586	string always starts with a BOM mark. Error handling is "strict". Return
				587	NULL if an exception was raised by the codec.
				588
				589	.. versionadded:: 2.6
				590
				591
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	592	UTF-16 Codecs
				593	"""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	594
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	595	These are the UTF-16 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	596
				597
				598	.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
				599
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	600	Decode size bytes from a UTF-16 encoded buffer string and return the
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	601	corresponding Unicode object. errors (if non-NULL) defines the error
				602	handling. It defaults to "strict".
				603
				604	If byteorder is non-NULL, the decoder starts decoding using the given byte
				605	order::
				606
				607	*byteorder == -1: little endian
				608	*byteorder == 0: native order
				609	*byteorder == 1: big endian
				610
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	611	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				612	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				613	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				614	``1``, any byte order mark is copied to the output (where it will result in
				615	either a ``\ufeff`` or a ``\ufffe`` character).
				616
				617	After completion, \byteorder* is set to the current byte order at the end
				618	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	619
				620	If byteorder is NULL, the codec starts in native order mode.
				621
				622	Return NULL if an exception was raised by the codec.
				623
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	624	.. versionchanged:: 2.5
				625	This function used an :ctype:`int` type for size. This might require
				626	changes in your code for properly supporting 64-bit systems.
				627
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	628
				629	.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				630
				631	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
				632	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
				633	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				634	split surrogate pair) as an error. Those bytes will not be decoded and the
				635	number of bytes that have been decoded will be stored in consumed.
				636
				637	.. versionadded:: 2.4
				638
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	639	.. versionchanged:: 2.5
				640	This function used an :ctype:`int` type for size and an :ctype:`int *`
				641	type for consumed. This might require changes in your code for
				642	properly supporting 64-bit systems.
				643
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	644
				645	.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				646
				647	Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	648	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	649
				650	byteorder == -1: little endian
				651	byteorder == 0: native byte order (writes a BOM mark)
				652	byteorder == 1: big endian
				653
				654	If byteorder is ``0``, the output string will always start with the Unicode BOM
				655	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				656
				657	If Py_UNICODE_WIDE is defined, a single :ctype:`Py_UNICODE` value may get
				658	represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
				659	values is interpreted as an UCS-2 character.
				660
				661	Return NULL if an exception was raised by the codec.
				662
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	663	.. versionchanged:: 2.5
				664	This function used an :ctype:`int` type for size. This might require
				665	changes in your code for properly supporting 64-bit systems.
				666
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	667
				668	.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				669
				670	Return a Python string using the UTF-16 encoding in native byte order. The
				671	string always starts with a BOM mark. Error handling is "strict". Return
				672	NULL if an exception was raised by the codec.
				673
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	674
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	675	UTF-7 Codecs
				676	""""""""""""
				677
				678	These are the UTF-7 codec APIs:
				679
				680
				681	.. cfunction:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
				682
				683	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				684	s. Return NULL if an exception was raised by the codec.
				685
				686
Georg Brandl	21946af	2010-10-06 09:28:45 +0000	[diff] [blame]	687	.. cfunction:: PyObject* PyUnicode_DecodeUTF7Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	688
				689	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF7`. If
				690	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				691	be treated as an error. Those bytes will not be decoded and the number of
				692	bytes that have been decoded will be stored in consumed.
				693
				694
				695	.. cfunction:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char errors)
				696
				697	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-7 and
				698	return a Python bytes object. Return NULL if an exception was raised by
				699	the codec.
				700
				701	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				702	special meaning) will be encoded in base-64. If base64WhiteSpace is
				703	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				704	Python "utf-7" codec.
				705
				706
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	707	Unicode-Escape Codecs
				708	"""""""""""""""""""""
				709
				710	These are the "Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	711
				712
				713	.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				714
				715	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				716	string s. Return NULL if an exception was raised by the codec.
				717
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	718	.. versionchanged:: 2.5
				719	This function used an :ctype:`int` type for size. This might require
				720	changes in your code for properly supporting 64-bit systems.
				721
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	722
				723	.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
				724
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	725	Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	726	return a Python string object. Return NULL if an exception was raised by the
				727	codec.
				728
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	729	.. versionchanged:: 2.5
				730	This function used an :ctype:`int` type for size. This might require
				731	changes in your code for properly supporting 64-bit systems.
				732
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	733
				734	.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				735
				736	Encode a Unicode object using Unicode-Escape and return the result as Python
				737	string object. Error handling is "strict". Return NULL if an exception was
				738	raised by the codec.
				739
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	740
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	741	Raw-Unicode-Escape Codecs
				742	"""""""""""""""""""""""""
				743
				744	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	745
				746
				747	.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				748
				749	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				750	encoded string s. Return NULL if an exception was raised by the codec.
				751
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	752	.. versionchanged:: 2.5
				753	This function used an :ctype:`int` type for size. This might require
				754	changes in your code for properly supporting 64-bit systems.
				755
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	756
				757	.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
				758
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	759	Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	760	and return a Python string object. Return NULL if an exception was raised by
				761	the codec.
				762
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	763	.. versionchanged:: 2.5
				764	This function used an :ctype:`int` type for size. This might require
				765	changes in your code for properly supporting 64-bit systems.
				766
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	767
				768	.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				769
				770	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				771	Python string object. Error handling is "strict". Return NULL if an exception
				772	was raised by the codec.
				773
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	774
				775	Latin-1 Codecs
				776	""""""""""""""
				777
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	778	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				779	ordinals and only these are accepted by the codecs during encoding.
				780
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	781
				782	.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
				783
				784	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				785	s. Return NULL if an exception was raised by the codec.
				786
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	787	.. versionchanged:: 2.5
				788	This function used an :ctype:`int` type for size. This might require
				789	changes in your code for properly supporting 64-bit systems.
				790
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	791
				792	.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
				793
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	794	Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	795	a Python string object. Return NULL if an exception was raised by the codec.
				796
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	797	.. versionchanged:: 2.5
				798	This function used an :ctype:`int` type for size. This might require
				799	changes in your code for properly supporting 64-bit systems.
				800
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	801
				802	.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				803
				804	Encode a Unicode object using Latin-1 and return the result as Python string
				805	object. Error handling is "strict". Return NULL if an exception was raised
				806	by the codec.
				807
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	808
				809	ASCII Codecs
				810	""""""""""""
				811
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	812	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				813	codes generate errors.
				814
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	815
				816	.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
				817
				818	Create a Unicode object by decoding size bytes of the ASCII encoded string
				819	s. Return NULL if an exception was raised by the codec.
				820
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	821	.. versionchanged:: 2.5
				822	This function used an :ctype:`int` type for size. This might require
				823	changes in your code for properly supporting 64-bit systems.
				824
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	825
				826	.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
				827
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	828	Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	829	Python string object. Return NULL if an exception was raised by the codec.
				830
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	831	.. versionchanged:: 2.5
				832	This function used an :ctype:`int` type for size. This might require
				833	changes in your code for properly supporting 64-bit systems.
				834
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	835
				836	.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				837
				838	Encode a Unicode object using ASCII and return the result as Python string
				839	object. Error handling is "strict". Return NULL if an exception was raised
				840	by the codec.
				841
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	842
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	843	Character Map Codecs
				844	""""""""""""""""""""
				845
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	846	This codec is special in that it can be used to implement many different codecs
				847	(and this is in fact what was done to obtain most of the standard codecs
				848	included in the :mod:`encodings` package). The codec uses mapping to encode and
				849	decode characters.
				850
				851	Decoding mappings must map single string characters to single Unicode
				852	characters, integers (which are then interpreted as Unicode ordinals) or None
				853	(meaning "undefined mapping" and causing an error).
				854
				855	Encoding mappings must map single Unicode characters to single string
				856	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				857	(meaning "undefined mapping" and causing an error).
				858
				859	The mapping objects provided must only support the __getitem__ mapping
				860	interface.
				861
				862	If a character lookup fails with a LookupError, the character is copied as-is
				863	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				864	resp. Because of this, mappings only need to contain those mappings which map
				865	characters to different code points.
				866
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	867	These are the mapping codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	868
				869	.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
				870
				871	Create a Unicode object by decoding size bytes of the encoded string s using
				872	the given mapping object. Return NULL if an exception was raised by the
				873	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				874	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				875	Byte values greater that the length of the string and U+FFFE "characters" are
				876	treated as "undefined mapping".
				877
				878	.. versionchanged:: 2.4
				879	Allowed unicode string as mapping argument.
				880
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	881	.. versionchanged:: 2.5
				882	This function used an :ctype:`int` type for size. This might require
				883	changes in your code for properly supporting 64-bit systems.
				884
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	885
				886	.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
				887
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	888	Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	889	mapping object and return a Python string object. Return NULL if an
				890	exception was raised by the codec.
				891
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	892	.. versionchanged:: 2.5
				893	This function used an :ctype:`int` type for size. This might require
				894	changes in your code for properly supporting 64-bit systems.
				895
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	896
				897	.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
				898
				899	Encode a Unicode object using the given mapping object and return the result
				900	as Python string object. Error handling is "strict". Return NULL if an
				901	exception was raised by the codec.
				902
				903	The following codec API is special in that maps Unicode to Unicode.
				904
				905
				906	.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
				907
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	908	Translate a :ctype:`Py_UNICODE` buffer of the given size by applying a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	909	character mapping table to it and return the resulting Unicode object. Return
				910	NULL when an exception was raised by the codec.
				911
				912	The mapping table must map Unicode ordinal integers to Unicode ordinal
				913	integers or None (causing deletion of the character).
				914
				915	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				916	and sequences work well. Unmapped character ordinals (ones which cause a
				917	:exc:`LookupError`) are left untouched and are copied as-is.
				918
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	919	.. versionchanged:: 2.5
				920	This function used an :ctype:`int` type for size. This might require
				921	changes in your code for properly supporting 64-bit systems.
				922
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	923
				924	MBCS codecs for Windows
				925	"""""""""""""""""""""""
				926
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	927	These are the MBCS codec APIs. They are currently only available on Windows and
				928	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				929	DBCS) is a class of encodings, not just one. The target encoding is defined by
				930	the user settings on the machine running the codec.
				931
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	932
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	933	.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
				934
				935	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				936	Return NULL if an exception was raised by the codec.
				937
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	938	.. versionchanged:: 2.5
				939	This function used an :ctype:`int` type for size. This might require
				940	changes in your code for properly supporting 64-bit systems.
				941
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	942
				943	.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
				944
				945	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
				946	consumed is not NULL, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
				947	trailing lead byte and the number of bytes that have been decoded will be stored
				948	in consumed.
				949
				950	.. versionadded:: 2.5
				951
				952
				953	.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
				954
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	955	Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	956	Python string object. Return NULL if an exception was raised by the codec.
				957
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	958	.. versionchanged:: 2.5
				959	This function used an :ctype:`int` type for size. This might require
				960	changes in your code for properly supporting 64-bit systems.
				961
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	962
				963	.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				964
				965	Encode a Unicode object using MBCS and return the result as Python string
				966	object. Error handling is "strict". Return NULL if an exception was raised
				967	by the codec.
				968
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	969
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	970	Methods & Slots
				971	"""""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	972
				973	.. _unicodemethodsandslots:
				974
				975	Methods and Slot Functions
				976	^^^^^^^^^^^^^^^^^^^^^^^^^^
				977
				978	The following APIs are capable of handling Unicode objects and strings on input
				979	(we refer to them as strings in the descriptions) and return Unicode objects or
				980	integers as appropriate.
				981
				982	They all return NULL or ``-1`` if an exception occurs.
				983
				984
				985	.. cfunction:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
				986
				987	Concat two strings giving a new Unicode string.
				988
				989
				990	.. cfunction:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
				991
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	992	Split a string giving a list of Unicode strings. If sep is NULL, splitting
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	993	will be done at all whitespace substrings. Otherwise, splits occur at the given
				994	separator. At most maxsplit splits will be done. If negative, no limit is
				995	set. Separators are not included in the resulting list.
				996
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	997	.. versionchanged:: 2.5
				998	This function used an :ctype:`int` type for maxsplit. This might require
				999	changes in your code for properly supporting 64-bit systems.
				1000
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1001
				1002	.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
				1003
				1004	Split a Unicode string at line breaks, returning a list of Unicode strings.
				1005	CRLF is considered to be one line break. If keepend is 0, the Line break
				1006	characters are not included in the resulting strings.
				1007
				1008
				1009	.. cfunction:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
				1010
				1011	Translate a string by applying a character mapping table to it and return the
				1012	resulting Unicode object.
				1013
				1014	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1015	or None (causing deletion of the character).
				1016
				1017	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1018	and sequences work well. Unmapped character ordinals (ones which cause a
				1019	:exc:`LookupError`) are left untouched and are copied as-is.
				1020
				1021	errors has the usual meaning for codecs. It may be NULL which indicates to
				1022	use the default error handling.
				1023
				1024
				1025	.. cfunction:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
				1026
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	1027	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1028	Unicode string.
				1029
				1030
				1031	.. cfunction:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				1032
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	1033	Return 1 if substr matches ``str[start:end]`` at the given tail end
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1034	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				1035	0 otherwise. Return ``-1`` if an error occurred.
				1036
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1037	.. versionchanged:: 2.5
				1038	This function used an :ctype:`int` type for start and end. This
				1039	might require changes in your code for properly supporting 64-bit
				1040	systems.
				1041
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1042
				1043	.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				1044
Ezio Melotti	020f650	2011-04-14 07:39:06 +0300	[diff] [blame]	1045	Return the first position of substr in ``str[start:end]`` using the given
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1046	direction (direction == 1 means to do a forward search, direction == -1 a
				1047	backward search). The return value is the index of the first match; a value of
				1048	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1049	occurred and an exception has been set.
				1050
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1051	.. versionchanged:: 2.5
				1052	This function used an :ctype:`int` type for start and end. This
				1053	might require changes in your code for properly supporting 64-bit
				1054	systems.
				1055
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1056
				1057	.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
				1058
				1059	Return the number of non-overlapping occurrences of substr in
				1060	``str[start:end]``. Return ``-1`` if an error occurred.
				1061
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1062	.. versionchanged:: 2.5
				1063	This function returned an :ctype:`int` type and used an :ctype:`int`
				1064	type for start and end. This might require changes in your code for
				1065	properly supporting 64-bit systems.
				1066
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1067
				1068	.. cfunction:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
				1069
				1070	Replace at most maxcount occurrences of substr in str with replstr and
				1071	return the resulting Unicode object. maxcount == -1 means replace all
				1072	occurrences.
				1073
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	1074	.. versionchanged:: 2.5
				1075	This function used an :ctype:`int` type for maxcount. This might
				1076	require changes in your code for properly supporting 64-bit systems.
				1077
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1078
				1079	.. cfunction:: int PyUnicode_Compare(PyObject left, PyObject right)
				1080
				1081	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				1082	respectively.
				1083
				1084
				1085	.. cfunction:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
				1086
				1087	Rich compare two unicode strings and return one of the following:
				1088
				1089	* ``NULL`` in case an exception was raised
				1090	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1091	* :const:`Py_NotImplemented` in case the type combination is unknown
				1092
				1093	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				1094	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				1095	with a :exc:`UnicodeDecodeError`.
				1096
				1097	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1098	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1099
				1100
				1101	.. cfunction:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
				1102
				1103	Return a new string object from format and args; this is analogous to
				1104	``format % args``. The args argument must be a tuple.
				1105
				1106
				1107	.. cfunction:: int PyUnicode_Contains(PyObject container, PyObject element)
				1108
				1109	Check whether element is contained in container and return true or false
				1110	accordingly.
				1111
				1112	element has to coerce to a one element Unicode string. ``-1`` is returned if
				1113	there was an error.