Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 43e3d2fef23b271a6b459f9cb38066a2ebfd42ba [file] [log] [blame]

Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	9	.. sectionauthor:: Georg Brandl <georg@python.org>
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	10
				11	Unicode Objects
				12	^^^^^^^^^^^^^^^
				13
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	14	Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
				15	use a variety of representations, in order to allow handling the complete range
				16	of Unicode characters while staying memory efficient. There are special cases
				17	for strings where all code points are below 128, 256, or 65536; otherwise, code
				18	points must be below 1114112 (which is the full Unicode range).
				19
				20	:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	21	in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
				22	and inefficient; it should be avoided in performance- or memory-sensitive
				23	situations.
				24
				25	Due to the transition between the old APIs and the new APIs, unicode objects
				26	can internally be in two states depending on how they were created:
				27
				28	* "canonical" unicode objects are all objects created by a non-deprecated
				29	unicode API. They use the most efficient representation allowed by the
				30	implementation.
				31
				32	* "legacy" unicode objects have been created through one of the deprecated
				33	APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
				34	:c:type:`Py_UNICODE*` representation; you will have to call
				35	:c:func:`PyUnicode_READY` on them before calling any other API.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	36
				37
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	38	Unicode Type
				39	""""""""""""
				40
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	41	These are the basic Unicode object types used for the Unicode implementation in
				42	Python:
				43
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	44	.. c:type:: Py_UCS4
				45	Py_UCS2
				46	Py_UCS1
				47
				48	These types are typedefs for unsigned integer types wide enough to contain
				49	characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
				50	single Unicode characters, use :c:type:`Py_UCS4`.
				51
				52	.. versionadded:: 3.3
				53
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	54
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	55	.. c:type:: Py_UNICODE
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	56
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	57	This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
				58	depending on the platform.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	59
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	60	.. versionchanged:: 3.3
				61	In previous versions, this was a 16-bit type or a 32-bit type depending on
				62	whether you selected a "narrow" or "wide" Unicode version of Python at
				63	build time.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	64
				65
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	66	.. c:type:: PyASCIIObject
				67	PyCompactUnicodeObject
				68	PyUnicodeObject
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	69
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	70	These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
				71	almost all cases, they shouldn't be used directly, since all API functions
				72	that deal with Unicode objects take and return :c:type:`PyObject` pointers.
				73
				74	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	75
				76
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	77	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	78
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	79	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	80	is exposed to Python code as ``str``.
				81
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	82
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	83	The following APIs are really C macros and can be used to do fast checks and to
				84	access internal read-only data of Unicode objects:
				85
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	86	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	87
				88	Return true if the object o is a Unicode object or an instance of a Unicode
				89	subtype.
				90
				91
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	92	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	93
				94	Return true if the object o is a Unicode object, but not an instance of a
				95	subtype.
				96
				97
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	98	.. c:function:: int PyUnicode_READY(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	99
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	100	Ensure the string object o is in the "canonical" representation. This is
				101	required before using any of the access macros described below.
				102
				103	.. XXX expand on when it is not required
				104
				105	Returns 0 on success and -1 with an exception set on failure, which in
				106	particular happens if memory allocation fails.
				107
				108	.. versionadded:: 3.3
				109
				110
				111	.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
				112
				113	Return the length of the Unicode string, in code points. o has to be a
				114	Unicode object in the "canonical" representation (not checked).
				115
				116	.. versionadded:: 3.3
				117
				118
				119	.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
				120	Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
				121	Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
				122
				123	Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
				124	integer types for direct character access. No checks are performed if the
				125	canonical representation has the correct character size; use
Martin v. Löwis	2da16e6	2011-10-07 20:58:00 +0200	[diff] [blame]	126	:c:func:`PyUnicode_KIND` to select the right macro. Make sure
Martin v. Löwis	c47adb0	2011-10-07 20:55:35 +0200	[diff] [blame]	127	:c:func:`PyUnicode_READY` has been called before accessing this.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	128
				129	.. versionadded:: 3.3
				130
				131
Victor Stinner	b4938aa	2011-11-20 18:27:28 +0100	[diff] [blame]	132	.. c:macro:: PyUnicode_WCHAR_KIND
				133	PyUnicode_1BYTE_KIND
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	134	PyUnicode_2BYTE_KIND
				135	PyUnicode_4BYTE_KIND
				136
				137	Return values of the :c:func:`PyUnicode_KIND` macro.
				138
				139	.. versionadded:: 3.3
				140
				141
				142	.. c:function:: int PyUnicode_KIND(PyObject *o)
				143
				144	Return one of the PyUnicode kind constants (see above) that indicate how many
				145	bytes per character this Unicode object uses to store its data. o has to
				146	be a Unicode object in the "canonical" representation (not checked).
				147
				148	.. XXX document "0" return value?
				149
				150	.. versionadded:: 3.3
				151
				152
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	153	.. c:function:: void* PyUnicode_DATA(PyObject *o)
				154
				155	Return a void pointer to the raw unicode buffer. o has to be a Unicode
				156	object in the "canonical" representation (not checked).
				157
				158	.. versionadded:: 3.3
				159
				160
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	161	.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
				162	Py_UCS4 value)
				163
				164	Write into a canonical representation data (as obtained with
				165	:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
				166	intended for usage in loops. The caller should cache the kind value and
				167	data pointer as obtained from other macro calls. index is the index in
				168	the string (starts at 0) and value is the new code point value which should
				169	be written to that location.
				170
				171	.. versionadded:: 3.3
				172
				173
				174	.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
				175
				176	Read a code point from a canonical representation data (as obtained with
				177	:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
				178
				179	.. versionadded:: 3.3
				180
				181
				182	.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
				183
				184	Read a character from a Unicode object o, which must be in the "canonical"
				185	representation. This is less efficient than :c:func:`PyUnicode_READ` if you
				186	do multiple consecutive reads.
				187
				188	.. versionadded:: 3.3
				189
				190
				191	.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
				192
				193	Return the maximum code point that is suitable for creating another string
				194	based on o, which must be in the "canonical" representation. This is
				195	always an approximation but more efficient than iterating over the string.
				196
				197	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	198
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	199
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	200	.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	201
				202	Clear the free list. Return the total number of freed items.
				203
Alexandre Vassalotti	6d3dfc3	2009-07-29 19:54:39 +0000	[diff] [blame]	204
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	205	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				206
				207	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				208	code units (this includes surrogate pairs as 2 units). o has to be a
				209	Unicode object (not checked).
				210
				211	.. deprecated-removed:: 3.3 4.0
				212	Part of the old-style Unicode API, please migrate to using
				213	:c:func:`PyUnicode_GET_LENGTH`.
				214
				215
				216	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				217
				218	Return the size of the deprecated :c:type:`Py_UNICODE` representation in
				219	bytes. o has to be a Unicode object (not checked).
				220
				221	.. deprecated-removed:: 3.3 4.0
				222	Part of the old-style Unicode API, please migrate to using
				223	:c:func:`PyUnicode_GET_LENGTH`.
				224
				225
				226	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				227	const char* PyUnicode_AS_DATA(PyObject *o)
				228
				229	Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
				230	``AS_DATA`` form casts the pointer to :c:type:`const char `. o* has to be
				231	a Unicode object (not checked).
				232
				233	.. versionchanged:: 3.3
				234	This macro is now inefficient -- because in many cases the
				235	:c:type:`Py_UNICODE` representation does not exist and needs to be created
				236	-- and can fail (return NULL with an exception set). Try to port the
				237	code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
				238	:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
				239
				240	.. deprecated-removed:: 3.3 4.0
				241	Part of the old-style Unicode API, please migrate to using the
				242	:c:func:`PyUnicode_nBYTE_DATA` family of macros.
				243
				244
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	245	Unicode Character Properties
				246	""""""""""""""""""""""""""""
				247
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	248	Unicode provides many different character properties. The most often needed ones
				249	are available through these macros which are mapped to C functions depending on
				250	the Python configuration.
				251
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	252
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	253	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	254
				255	Return 1 or 0 depending on whether ch is a whitespace character.
				256
				257
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	258	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	259
				260	Return 1 or 0 depending on whether ch is a lowercase character.
				261
				262
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	263	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	264
				265	Return 1 or 0 depending on whether ch is an uppercase character.
				266
				267
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	268	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	269
				270	Return 1 or 0 depending on whether ch is a titlecase character.
				271
				272
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	273	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	274
				275	Return 1 or 0 depending on whether ch is a linebreak character.
				276
				277
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	278	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	279
				280	Return 1 or 0 depending on whether ch is a decimal character.
				281
				282
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	283	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	284
				285	Return 1 or 0 depending on whether ch is a digit character.
				286
				287
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	288	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	289
				290	Return 1 or 0 depending on whether ch is a numeric character.
				291
				292
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	293	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	294
				295	Return 1 or 0 depending on whether ch is an alphabetic character.
				296
				297
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	298	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	299
				300	Return 1 or 0 depending on whether ch is an alphanumeric character.
				301
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	302
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	303	.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	304
				305	Return 1 or 0 depending on whether ch is a printable character.
				306	Nonprintable characters are those characters defined in the Unicode character
				307	database as "Other" or "Separator", excepting the ASCII space (0x20) which is
				308	considered printable. (Note that printable characters in this context are
				309	those which should not be escaped when :func:`repr` is invoked on a string.
				310	It has no bearing on the handling of strings written to :data:`sys.stdout` or
				311	:data:`sys.stderr`.)
				312
				313
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	314	These APIs can be used for fast direct character conversions:
				315
				316
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	317	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	318
				319	Return the character ch converted to lower case.
				320
				321
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	322	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	323
				324	Return the character ch converted to upper case.
				325
				326
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	327	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	328
				329	Return the character ch converted to title case.
				330
				331
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	332	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	333
				334	Return the character ch converted to a decimal positive integer. Return
				335	``-1`` if this is not possible. This macro does not raise exceptions.
				336
				337
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	338	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	339
				340	Return the character ch converted to a single digit integer. Return ``-1`` if
				341	this is not possible. This macro does not raise exceptions.
				342
				343
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	344	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	345
				346	Return the character ch converted to a double. Return ``-1.0`` if this is not
				347	possible. This macro does not raise exceptions.
				348
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	349
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	350	These APIs can be used to work with surrogates:
				351
				352	.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
				353
				354	Check if ch is a surrogate (``0xD800 <= ch <= 0xDFFF``).
				355
				356	.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
				357
				358	Check if ch is an high surrogate (``0xD800 <= ch <= 0xDBFF``).
				359
				360	.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
				361
				362	Check if ch is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
				363
				364	.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
				365
				366	Join two surrogate characters and return a single Py_UCS4 value.
				367	high and low are respectively the leading and trailing surrogates in a
				368	surrogate pair.
				369
				370
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	371	Creating and accessing Unicode strings
				372	""""""""""""""""""""""""""""""""""""""
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	373
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	374	To create Unicode objects and access their basic sequence properties, use these
				375	APIs:
				376
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	377	.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	378
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	379	Create a new Unicode object. maxchar should be the true maximum code point
				380	to be placed in the string. As an approximation, it can be rounded up to the
				381	nearest value in the sequence 127, 255, 65535, 1114111.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	382
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	383	This is the recommended way to allocate a new Unicode object. Objects
				384	created using this function are not resizable.
				385
				386	.. versionadded:: 3.3
				387
				388
				389	.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
				390	Py_ssize_t size)
				391
				392	Create a new Unicode object with the given kind (possible values are
				393	:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
				394	:c:func:`PyUnicode_KIND`). The buffer must point to an array of size
				395	units of 1, 2 or 4 bytes per character, as given by the kind.
				396
				397	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	398
				399
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	400	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	401
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	402	Create a Unicode object from the char buffer u. The bytes will be
				403	interpreted as being UTF-8 encoded. The buffer is copied into the new
				404	object. If the buffer is not NULL, the return value might be a shared
				405	object, i.e. modification of the data is not allowed.
				406
				407	If u is NULL, this function behaves like :c:func:`PyUnicode_FromUnicode`
				408	with the buffer set to NULL. This usage is deprecated in favor of
				409	:c:func:`PyUnicode_New`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	410
				411
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	412	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	413
				414	Create a Unicode object from an UTF-8 encoded null-terminated char buffer
				415	u.
				416
				417
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	418	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	419
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	420	Take a C :c:func:`printf`\ -style format string and a variable number of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	421	arguments, calculate the size of the resulting Python unicode string and return
				422	a string with the values formatted into it. The variable arguments must be C
				423	types and must correspond exactly to the format characters in the format
Victor Stinner	1205f27	2010-09-11 00:54:47 +0000	[diff] [blame]	424	ASCII-encoded string. The following format characters are allowed:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	425
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	426	.. % This should be exactly the same as the table in PyErr_Format.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	427	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				428	.. % because not all compilers support the %z width modifier -- we fake it
				429	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	430	.. % Similar comments apply to the %ll width modifier and
				431	.. % PY_FORMAT_LONG_LONG.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	432
				433	+-------------------+---------------------+--------------------------------+
				434	\| Format Characters \| Type \| Comment \|
				435	+===================+=====================+================================+
				436	\| :attr:`%%` \| n/a \| The literal % character. \|
				437	+-------------------+---------------------+--------------------------------+
				438	\| :attr:`%c` \| int \| A single character, \|
				439	\| \| \| represented as an C int. \|
				440	+-------------------+---------------------+--------------------------------+
				441	\| :attr:`%d` \| int \| Exactly equivalent to \|
				442	\| \| \| ``printf("%d")``. \|
				443	+-------------------+---------------------+--------------------------------+
				444	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				445	\| \| \| ``printf("%u")``. \|
				446	+-------------------+---------------------+--------------------------------+
				447	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				448	\| \| \| ``printf("%ld")``. \|
				449	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	450	\| :attr:`%li` \| long \| Exactly equivalent to \|
				451	\| \| \| ``printf("%li")``. \|
				452	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	453	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				454	\| \| \| ``printf("%lu")``. \|
				455	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	456	\| :attr:`%lld` \| long long \| Exactly equivalent to \|
				457	\| \| \| ``printf("%lld")``. \|
				458	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	459	\| :attr:`%lli` \| long long \| Exactly equivalent to \|
				460	\| \| \| ``printf("%lli")``. \|
				461	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	462	\| :attr:`%llu` \| unsigned long long \| Exactly equivalent to \|
				463	\| \| \| ``printf("%llu")``. \|
				464	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	465	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				466	\| \| \| ``printf("%zd")``. \|
				467	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	468	\| :attr:`%zi` \| Py_ssize_t \| Exactly equivalent to \|
				469	\| \| \| ``printf("%zi")``. \|
				470	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	471	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				472	\| \| \| ``printf("%zu")``. \|
				473	+-------------------+---------------------+--------------------------------+
				474	\| :attr:`%i` \| int \| Exactly equivalent to \|
				475	\| \| \| ``printf("%i")``. \|
				476	+-------------------+---------------------+--------------------------------+
				477	\| :attr:`%x` \| int \| Exactly equivalent to \|
				478	\| \| \| ``printf("%x")``. \|
				479	+-------------------+---------------------+--------------------------------+
				480	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				481	\| \| \| array. \|
				482	+-------------------+---------------------+--------------------------------+
				483	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				484	\| \| \| pointer. Mostly equivalent to \|
				485	\| \| \| ``printf("%p")`` except that \|
				486	\| \| \| it is guaranteed to start with \|
				487	\| \| \| the literal ``0x`` regardless \|
				488	\| \| \| of what the platform's \|
				489	\| \| \| ``printf`` yields. \|
				490	+-------------------+---------------------+--------------------------------+
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	491	\| :attr:`%A` \| PyObject\* \| The result of calling \|
				492	\| \| \| :func:`ascii`. \|
				493	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	494	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				495	+-------------------+---------------------+--------------------------------+
				496	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				497	\| \| \| NULL) and a null-terminated \|
				498	\| \| \| C character array as a second \|
				499	\| \| \| parameter (which will be used, \|
				500	\| \| \| if the first parameter is \|
				501	\| \| \| NULL). \|
				502	+-------------------+---------------------+--------------------------------+
				503	\| :attr:`%S` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	504	\| \| \| :c:func:`PyObject_Str`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	505	+-------------------+---------------------+--------------------------------+
				506	\| :attr:`%R` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	507	\| \| \| :c:func:`PyObject_Repr`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	508	+-------------------+---------------------+--------------------------------+
				509
				510	An unrecognized format character causes all the rest of the format string to be
				511	copied as-is to the result string, and any extra arguments discarded.
				512
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	513	.. note::
				514
				515	The `"%lld"` and `"%llu"` format specifiers are only available
Georg Brandl	ef871f6	2010-03-12 10:06:40 +0000	[diff] [blame]	516	when :const:`HAVE_LONG_LONG` is defined.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	517
				518	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	519	Support for ``"%lld"`` and ``"%llu"`` added.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	520
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	521	.. versionchanged:: 3.3
				522	Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
				523
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	524
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	525	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	526
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	527	Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	528	arguments.
				529
Alexander Belopolsky	942af5a	2010-12-04 03:38:46 +0000	[diff] [blame]	530
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	531	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
				532	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	533
				534	Coerce an encoded object obj to an Unicode object and return a reference with
				535	incremented refcount.
				536
Georg Brandl	952867a	2010-06-27 10:17:12 +0000	[diff] [blame]	537	:class:`bytes`, :class:`bytearray` and other char buffer compatible objects
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	538	are decoded according to the given encoding and using the error handling
				539	defined by errors. Both can be NULL to have the interface use the default
Georg Brandl	952867a	2010-06-27 10:17:12 +0000	[diff] [blame]	540	values (see the next section for details).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	541
				542	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				543	set.
				544
				545	The API returns NULL if there was an error. The caller is responsible for
				546	decref'ing the returned objects.
				547
				548
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	549	.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
				550
				551	Return the length of the Unicode object, in code points.
				552
				553	.. versionadded:: 3.3
				554
				555
				556	.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
				557	PyObject *to, Py_ssize_t from_start, Py_ssize_t how_many)
				558
				559	Copy characters from one Unicode object into another. This function performs
				560	character conversion when necessary and falls back to :c:func:`memcpy` if
				561	possible. Returns ``-1`` and sets an exception on error, otherwise returns
				562	``0``.
				563
				564	.. versionadded:: 3.3
				565
				566
				567	.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
				568	Py_UCS4 character)
				569
				570	Write a character to a string. The string must have been created through
				571	:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
				572	the string must not be shared, or have been hashed yet.
				573
				574	This function checks that unicode is a Unicode object, that the index is
				575	not out of bounds, and that the object can be modified safely (i.e. that it
				576	its reference count is one), in contrast to the macro version
				577	:c:func:`PyUnicode_WRITE_CHAR`.
				578
				579	.. versionadded:: 3.3
				580
				581
				582	.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
				583
				584	Read a character from a string. This function checks that unicode is a
				585	Unicode object and the index is not out of bounds, in contrast to the macro
				586	version :c:func:`PyUnicode_READ_CHAR`.
				587
				588	.. versionadded:: 3.3
				589
				590
				591	.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
				592	Py_ssize_t end)
				593
				594	Return a substring of str, from character index start (included) to
				595	character index end (excluded). Negative indices are not supported.
				596
				597	.. versionadded:: 3.3
				598
				599
				600	.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject u, Py_UCS4 buffer, \
				601	Py_ssize_t buflen, int copy_null)
				602
				603	Copy the string u into a UCS4 buffer, including a null character, if
				604	copy_null is set. Returns NULL and sets an exception on error (in
				605	particular, a :exc:`ValueError` if buflen is smaller than the length of
				606	u). buffer is returned on success.
				607
				608	.. versionadded:: 3.3
				609
				610
				611	.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
				612
				613	Copy the string u into a new UCS4 buffer that is allocated using
				614	:c:func:`PyMem_Malloc`. If this fails, NULL is returned with a
				615	:exc:`MemoryError` set.
				616
				617	.. versionadded:: 3.3
				618
				619
				620	Deprecated Py_UNICODE APIs
				621	""""""""""""""""""""""""""
				622
				623	.. deprecated-removed:: 3.3 4.0
				624
				625	These API functions are deprecated with the implementation of :pep:`393`.
				626	Extension modules can continue using them, as they will not be removed in Python
				627	3.x, but need to be aware that their use can now cause performance and memory hits.
				628
				629
				630	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				631
				632	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
				633	may be NULL which causes the contents to be undefined. It is the user's
				634	responsibility to fill in the needed data. The buffer is copied into the new
				635	object.
				636
				637	If the buffer is not NULL, the return value might be a shared object.
				638	Therefore, modification of the resulting Unicode object is only allowed when
				639	u is NULL.
				640
				641	If the buffer is NULL, :c:func:`PyUnicode_READY` must be called once the
				642	string content has been filled before using any of the access macros such as
				643	:c:func:`PyUnicode_KIND`.
				644
				645	Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
				646	:c:func:`PyUnicode_New`.
				647
				648
				649	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				650
				651	Return a read-only pointer to the Unicode object's internal
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame^]	652	:c:type:`Py_UNICODE` buffer, or NULL on error. This will create the
				653	:c:type:`Py_UNICODE*` representation of the object if it is not yet
				654	available. Note that the resulting :c:type:`Py_UNICODE` string may contain
				655	embedded null characters, which would cause the string to be truncated when
				656	used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	657
				658	Please migrate to using :c:func:`PyUnicode_AsUCS4`,
				659	:c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
				660	APIs.
				661
				662
				663	.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
				664
				665	Create a Unicode object by replacing all decimal digits in
				666	:c:type:`Py_UNICODE` buffer of the given size by ASCII digits 0--9
				667	according to their decimal value. Return NULL if an exception occurs.
				668
				669
				670	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject unicode, Py_ssize_t size)
				671
				672	Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame^]	673	array length in size. Note that the resulting :c:type:`Py_UNICODE*` string
				674	may contain embedded null characters, which would cause the string to be
				675	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	676
				677	.. versionadded:: 3.3
				678
				679
				680	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
				681
				682	Create a copy of a Unicode string ending with a nul character. Return NULL
				683	and raise a :exc:`MemoryError` exception on memory allocation failure,
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame^]	684	otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
				685	the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
				686	contain embedded null characters, which would cause the string to be
				687	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	688
				689	.. versionadded:: 3.2
				690
				691	Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
				692
				693
				694	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				695
				696	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				697	code units (this includes surrogate pairs as 2 units).
				698
				699	Please migrate to using :c:func:`PyUnicode_GetLength`.
				700
				701
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	702	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	703
				704	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				705	throughout the interpreter whenever coercion to Unicode is needed.
				706
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	707
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	708	Locale Encoding
				709	"""""""""""""""
				710
				711	The current locale encoding can be used to decode text from the operating
				712	system.
				713
				714	.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len, int surrogateescape)
				715
				716	Decode a string from the current locale encoding. The decoder is strict if
				717	surrogateescape is equal to zero, otherwise it uses the
				718	``'surrogateescape'`` error handler (:pep:`383`) to escape undecodable
				719	bytes. If a byte sequence can be decoded as a surrogate character and
				720	surrogateescape is not equal to zero, the byte sequence is escaped using
				721	the ``'surrogateescape'`` error handler instead of being decoded. str
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	722	must end with a null character but cannot contain embedded null characters.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	723
				724	.. seealso::
				725
				726	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
				727	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				728	Python startup).
				729
				730	.. versionadded:: 3.3
				731
				732
				733	.. c:function:: PyObject* PyUnicode_DecodeLocale(const char *str, int surrogateescape)
				734
				735	Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
				736	length using :c:func:`strlen`.
				737
				738	.. versionadded:: 3.3
				739
				740
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	741	.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject *unicode, int surrogateescape)
				742
				743	Encode a Unicode object to the current locale encoding. The encoder is
				744	strict if surrogateescape is equal to zero, otherwise it uses the
				745	``'surrogateescape'`` error handler (:pep:`383`). Return a :class:`bytes`
				746	object. str cannot contain embedded null characters.
				747
				748	.. seealso::
				749
				750	Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
				751	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				752	Python startup).
				753
				754	.. versionadded:: 3.3
				755
				756
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	757	File System Encoding
				758	""""""""""""""""""""
				759
				760	To encode and decode file names and other environment strings,
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	761	:c:data:`Py_FileSystemEncoding` should be used as the encoding, and
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	762	``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
				763	encode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	764	used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	765
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	766	.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	767
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	768	ParseTuple converter: encode :class:`str` objects to :class:`bytes` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	769	:c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
				770	result must be a :c:type:`PyBytesObject*` which must be released when it is
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	771	no longer used.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	772
				773	.. versionadded:: 3.1
				774
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	775
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	776	To decode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	777	used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	778
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	779	.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	780
				781	ParseTuple converter: decode :class:`bytes` objects to :class:`str` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	782	:c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str` objects are output
				783	as-is. result must be a :c:type:`PyUnicodeObject*` which must be released
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	784	when it is no longer used.
				785
				786	.. versionadded:: 3.2
				787
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	788
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	789	.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	790
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	791	Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
				792	``'surrogateescape'`` error handler, or ``'strict'`` on Windows.
				793
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	794	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				795	locale encoding.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	796
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	797	.. seealso::
				798
				799	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				800	locale encoding and cannot be modified later. If you need to decode a
				801	string from the current locale encoding, use
				802	:c:func:`PyUnicode_DecodeLocaleAndSize`.
				803
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	804	.. versionchanged:: 3.2
				805	Use ``'strict'`` error handler on Windows.
				806
				807
				808	.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
				809
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	810	Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	811	and the ``'surrogateescape'`` error handler, or ``'strict'`` on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	812
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	813	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				814	locale encoding.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	815
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	816	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	817
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	818	.. versionchanged:: 3.2
				819	Use ``'strict'`` error handler on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	820
				821
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	822	.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	823
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	824	Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	825	``'surrogateescape'`` error handler, or ``'strict'`` on Windows, and return
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	826	:class:`bytes`. Note that the resulting :class:`bytes` object may contain
				827	null bytes.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	828
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	829	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				830	locale encoding.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	831
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	832	.. seealso::
				833
				834	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				835	locale encoding and cannot be modified later. If you need to encode a
				836	string to the current locale encoding, use
				837	:c:func:`PyUnicode_EncodeLocale`.
				838
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	839	.. versionadded:: 3.2
				840
				841
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	842	wchar_t Support
				843	"""""""""""""""
				844
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	845	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	846
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	847	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	848
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	849	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	850	Passing -1 as the size indicates that the function must itself compute the length,
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	851	using wcslen.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	852	Return NULL on failure.
				853
				854
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	855	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	856
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	857	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				858	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
				859	0-termination character). Return the number of :c:type:`wchar_t` characters
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame^]	860	copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	861	string may or may not be 0-terminated. It is the responsibility of the caller
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame^]	862	to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	863	required by the application. Also, note that the :c:type:`wchar_t*` string
				864	might contain null characters, which would cause the string to be truncated
				865	when used with most C functions.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	866
				867
Victor Stinner	beb4135b	2010-10-07 01:02:42 +0000	[diff] [blame]	868	.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject unicode, Py_ssize_t size)
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	869
				870	Convert the Unicode object to a wide character string. The output string
				871	always ends with a nul character. If size is not NULL, write the number
Victor Stinner	1c24bd0	2010-10-02 11:03:13 +0000	[diff] [blame]	872	of wide characters (excluding the trailing 0-termination character) into
				873	\size*.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	874
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	875	Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
				876	:c:func:`PyMem_Free` to free it) on success. On error, returns NULL,
				877	\size* is undefined and raises a :exc:`MemoryError`. Note that the
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame^]	878	resulting :c:type:`wchar_t` string might contain null characters, which
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	879	would cause the string to be truncated when used with most C functions.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	880
				881	.. versionadded:: 3.2
				882
				883
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	884	UCS4 Support
				885	""""""""""""
				886
				887	.. versionadded:: 3.3
				888
				889	.. XXX are these meant to be public?
				890
				891	.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
				892	Py_UCS4* Py_UCS4_strcpy(Py_UCS4 s1, const Py_UCS4 s2)
				893	Py_UCS4* Py_UCS4_strncpy(Py_UCS4 s1, const Py_UCS4 s2, size_t n)
				894	Py_UCS4* Py_UCS4_strcat(Py_UCS4 s1, const Py_UCS4 s2)
				895	int Py_UCS4_strcmp(const Py_UCS4 s1, const Py_UCS4 s2)
				896	int Py_UCS4_strncmp(const Py_UCS4 s1, const Py_UCS4 s2, size_t n)
Antoine Pitrou	57735a0	2011-10-22 22:08:46 +0200	[diff] [blame]	897	Py_UCS4* Py_UCS4_strchr(const Py_UCS4 *s, Py_UCS4 c)
				898	Py_UCS4* Py_UCS4_strrchr(const Py_UCS4 *s, Py_UCS4 c)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	899
				900	These utility functions work on strings of :c:type:`Py_UCS4` characters and
				901	otherwise behave like the C standard library functions with the same name.
				902
				903
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	904	.. _builtincodecs:
				905
				906	Built-in Codecs
				907	^^^^^^^^^^^^^^^
				908
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	909	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	910	these codecs are directly usable via the following functions.
				911
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	912	Many of the following APIs take two arguments encoding and errors, and they
				913	have the same semantics as the ones of the built-in :func:`str` string object
				914	constructor.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	915
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	916	Setting encoding to NULL causes the default encoding to be used
				917	which is ASCII. The file system calls should use
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	918	:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
				919	variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	920	variable should be treated as read-only: on some systems, it will be a
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	921	pointer to a static string, on others, it will change at run-time
				922	(such as when the application invokes setlocale).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	923
				924	Error handling is set by errors which may also be set to NULL meaning to use
				925	the default handling defined for the codec. Default error handling for all
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	926	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	927
				928	The codecs all use a similar interface. Only deviation from the following
				929	generic ones are documented for simplicity.
				930
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	931
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	932	Generic Codecs
				933	""""""""""""""
				934
				935	These are the generic codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	936
				937
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	938	.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
				939	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	940
				941	Create a Unicode object by decoding size bytes of the encoded string s.
				942	encoding and errors have the same meaning as the parameters of the same name
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	943	in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	944	using the Python codec registry. Return NULL if an exception was raised by
				945	the codec.
				946
				947
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	948	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
				949	const char encoding, const char errors)
				950
				951	Encode a Unicode object and return the result as Python bytes object.
				952	encoding and errors have the same meaning as the parameters of the same
				953	name in the Unicode :meth:`encode` method. The codec to be used is looked up
				954	using the Python codec registry. Return NULL if an exception was raised by
				955	the codec.
				956
				957
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	958	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
				959	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	960
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	961	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	962	bytes object. encoding and errors have the same meaning as the
				963	parameters of the same name in the Unicode :meth:`encode` method. The codec
				964	to be used is looked up using the Python codec registry. Return NULL if an
				965	exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	966
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	967	.. deprecated-removed:: 3.3 4.0
				968	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				969	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	970
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	971
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	972	UTF-8 Codecs
				973	""""""""""""
				974
				975	These are the UTF-8 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	976
				977
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	978	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	979
				980	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				981	s. Return NULL if an exception was raised by the codec.
				982
				983
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	984	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
				985	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	986
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	987	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	988	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				989	treated as an error. Those bytes will not be decoded and the number of bytes
				990	that have been decoded will be stored in consumed.
				991
				992
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	993	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	994
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	995	Encode a Unicode object using UTF-8 and return the result as Python bytes
				996	object. Error handling is "strict". Return NULL if an exception was
				997	raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	998
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	999
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1000	.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject unicode, Py_ssize_t size)
				1001
				1002	Return a pointer to the default encoding (UTF-8) of the Unicode object, and
				1003	store the size of the encoded representation (in bytes) in size. size
				1004	can be NULL, in this case no size will be stored.
				1005
				1006	In the case of an error, NULL is returned with an exception set and no
				1007	size is stored.
				1008
				1009	This caches the UTF-8 representation of the string in the Unicode object, and
				1010	subsequent calls will return a pointer to the same buffer. The caller is not
				1011	responsible for deallocating the buffer.
				1012
				1013	.. versionadded:: 3.3
				1014
				1015
				1016	.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
				1017
				1018	As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
				1019
				1020	.. versionadded:: 3.3
				1021
				1022
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1023	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				1024
				1025	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and
				1026	return a Python bytes object. Return NULL if an exception was raised by
				1027	the codec.
				1028
				1029	.. deprecated-removed:: 3.3 4.0
				1030	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1031	:c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
				1032
				1033
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1034	UTF-32 Codecs
				1035	"""""""""""""
				1036
				1037	These are the UTF-32 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1038
				1039
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1040	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
				1041	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1042
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1043	Decode size bytes from a UTF-32 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1044	corresponding Unicode object. errors (if non-NULL) defines the error
				1045	handling. It defaults to "strict".
				1046
				1047	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1048	order::
				1049
				1050	*byteorder == -1: little endian
				1051	*byteorder == 0: native order
				1052	*byteorder == 1: big endian
				1053
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1054	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				1055	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1056	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1057	``1``, any byte order mark is copied to the output.
				1058
				1059	After completion, \byteorder* is set to the current byte order at the end
				1060	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1061
				1062	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				1063
				1064	If byteorder is NULL, the codec starts in native order mode.
				1065
				1066	Return NULL if an exception was raised by the codec.
				1067
				1068
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1069	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
				1070	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1071
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1072	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				1073	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1074	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				1075	by four) as an error. Those bytes will not be decoded and the number of bytes
				1076	that have been decoded will be stored in consumed.
				1077
				1078
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1079	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				1080
				1081	Return a Python byte string using the UTF-32 encoding in native byte
				1082	order. The string always starts with a BOM mark. Error handling is "strict".
				1083	Return NULL if an exception was raised by the codec.
				1084
				1085
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1086	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
				1087	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1088
				1089	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1090	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1091
				1092	byteorder == -1: little endian
				1093	byteorder == 0: native byte order (writes a BOM mark)
				1094	byteorder == 1: big endian
				1095
				1096	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1097	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1098
				1099	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				1100	as a single codepoint.
				1101
				1102	Return NULL if an exception was raised by the codec.
				1103
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1104	.. deprecated-removed:: 3.3 4.0
				1105	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1106	:c:func:`PyUnicode_AsUTF32String`.
				1107
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1108
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1109	UTF-16 Codecs
				1110	"""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1111
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1112	These are the UTF-16 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1113
				1114
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1115	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
				1116	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1117
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1118	Decode size bytes from a UTF-16 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1119	corresponding Unicode object. errors (if non-NULL) defines the error
				1120	handling. It defaults to "strict".
				1121
				1122	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1123	order::
				1124
				1125	*byteorder == -1: little endian
				1126	*byteorder == 0: native order
				1127	*byteorder == 1: big endian
				1128
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1129	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				1130	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1131	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1132	``1``, any byte order mark is copied to the output (where it will result in
				1133	either a ``\ufeff`` or a ``\ufffe`` character).
				1134
				1135	After completion, \byteorder* is set to the current byte order at the end
				1136	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1137
				1138	If byteorder is NULL, the codec starts in native order mode.
				1139
				1140	Return NULL if an exception was raised by the codec.
				1141
				1142
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1143	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
				1144	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1145
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1146	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				1147	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1148	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				1149	split surrogate pair) as an error. Those bytes will not be decoded and the
				1150	number of bytes that have been decoded will be stored in consumed.
				1151
				1152
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1153	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				1154
				1155	Return a Python byte string using the UTF-16 encoding in native byte
				1156	order. The string always starts with a BOM mark. Error handling is "strict".
				1157	Return NULL if an exception was raised by the codec.
				1158
				1159
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1160	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
				1161	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1162
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1163	Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1164	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1165
				1166	byteorder == -1: little endian
				1167	byteorder == 0: native byte order (writes a BOM mark)
				1168	byteorder == 1: big endian
				1169
				1170	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1171	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1172
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1173	If Py_UNICODE_WIDE is defined, a single :c:type:`Py_UNICODE` value may get
				1174	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1175	values is interpreted as an UCS-2 character.
				1176
				1177	Return NULL if an exception was raised by the codec.
				1178
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1179	.. deprecated-removed:: 3.3 4.0
				1180	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1181	:c:func:`PyUnicode_AsUTF16String`.
				1182
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1183
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1184	UTF-7 Codecs
				1185	""""""""""""
				1186
				1187	These are the UTF-7 codec APIs:
				1188
				1189
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1190	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1191
				1192	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				1193	s. Return NULL if an exception was raised by the codec.
				1194
				1195
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1196	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
				1197	const char errors, Py_ssize_t consumed)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1198
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1199	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1200	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				1201	be treated as an error. Those bytes will not be decoded and the number of
				1202	bytes that have been decoded will be stored in consumed.
				1203
				1204
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1205	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
				1206	int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1207
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1208	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1209	return a Python bytes object. Return NULL if an exception was raised by
				1210	the codec.
				1211
				1212	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				1213	special meaning) will be encoded in base-64. If base64WhiteSpace is
				1214	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				1215	Python "utf-7" codec.
				1216
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1217	.. deprecated-removed:: 3.3 4.0
				1218	Part of the old-style :c:type:`Py_UNICODE` API.
				1219
				1220	.. XXX replace with what?
				1221
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1222
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1223	Unicode-Escape Codecs
				1224	"""""""""""""""""""""
				1225
				1226	These are the "Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1227
				1228
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1229	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
				1230	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1231
				1232	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				1233	string s. Return NULL if an exception was raised by the codec.
				1234
				1235
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1236	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				1237
				1238	Encode a Unicode object using Unicode-Escape and return the result as Python
				1239	string object. Error handling is "strict". Return NULL if an exception was
				1240	raised by the codec.
				1241
				1242
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1243	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1244
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1245	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1246	return a Python string object. Return NULL if an exception was raised by the
				1247	codec.
				1248
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1249	.. deprecated-removed:: 3.3 4.0
				1250	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1251	:c:func:`PyUnicode_AsUnicodeEscapeString`.
				1252
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1253
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1254	Raw-Unicode-Escape Codecs
				1255	"""""""""""""""""""""""""
				1256
				1257	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1258
				1259
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1260	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
				1261	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1262
				1263	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				1264	encoded string s. Return NULL if an exception was raised by the codec.
				1265
				1266
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1267	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				1268
				1269	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				1270	Python string object. Error handling is "strict". Return NULL if an exception
				1271	was raised by the codec.
				1272
				1273
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1274	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
				1275	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1276
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1277	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1278	and return a Python string object. Return NULL if an exception was raised by
				1279	the codec.
				1280
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1281	.. deprecated-removed:: 3.3 4.0
				1282	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1283	:c:func:`PyUnicode_AsRawUnicodeEscapeString`.
				1284
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1285
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1286	Latin-1 Codecs
				1287	""""""""""""""
				1288
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1289	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				1290	ordinals and only these are accepted by the codecs during encoding.
				1291
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1292
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1293	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1294
				1295	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				1296	s. Return NULL if an exception was raised by the codec.
				1297
				1298
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1299	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				1300
				1301	Encode a Unicode object using Latin-1 and return the result as Python bytes
				1302	object. Error handling is "strict". Return NULL if an exception was
				1303	raised by the codec.
				1304
				1305
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1306	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1307
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1308	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1309	return a Python bytes object. Return NULL if an exception was raised by
				1310	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1311
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1312	.. deprecated-removed:: 3.3 4.0
				1313	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1314	:c:func:`PyUnicode_AsLatin1String`.
				1315
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1316
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1317	ASCII Codecs
				1318	""""""""""""
				1319
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1320	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				1321	codes generate errors.
				1322
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1323
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1324	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1325
				1326	Create a Unicode object by decoding size bytes of the ASCII encoded string
				1327	s. Return NULL if an exception was raised by the codec.
				1328
				1329
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1330	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				1331
				1332	Encode a Unicode object using ASCII and return the result as Python bytes
				1333	object. Error handling is "strict". Return NULL if an exception was
				1334	raised by the codec.
				1335
				1336
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1337	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1338
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1339	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1340	return a Python bytes object. Return NULL if an exception was raised by
				1341	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1342
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1343	.. deprecated-removed:: 3.3 4.0
				1344	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1345	:c:func:`PyUnicode_AsASCIIString`.
				1346
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1347
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1348	Character Map Codecs
				1349	""""""""""""""""""""
				1350
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1351	This codec is special in that it can be used to implement many different codecs
				1352	(and this is in fact what was done to obtain most of the standard codecs
				1353	included in the :mod:`encodings` package). The codec uses mapping to encode and
				1354	decode characters.
				1355
				1356	Decoding mappings must map single string characters to single Unicode
				1357	characters, integers (which are then interpreted as Unicode ordinals) or None
				1358	(meaning "undefined mapping" and causing an error).
				1359
				1360	Encoding mappings must map single Unicode characters to single string
				1361	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				1362	(meaning "undefined mapping" and causing an error).
				1363
				1364	The mapping objects provided must only support the __getitem__ mapping
				1365	interface.
				1366
				1367	If a character lookup fails with a LookupError, the character is copied as-is
				1368	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				1369	resp. Because of this, mappings only need to contain those mappings which map
				1370	characters to different code points.
				1371
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1372	These are the mapping codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1373
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1374	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
				1375	PyObject mapping, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1376
				1377	Create a Unicode object by decoding size bytes of the encoded string s using
				1378	the given mapping object. Return NULL if an exception was raised by the
				1379	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				1380	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				1381	Byte values greater that the length of the string and U+FFFE "characters" are
				1382	treated as "undefined mapping".
				1383
				1384
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1385	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1386
				1387	Encode a Unicode object using the given mapping object and return the result
				1388	as Python string object. Error handling is "strict". Return NULL if an
				1389	exception was raised by the codec.
				1390
				1391	The following codec API is special in that maps Unicode to Unicode.
				1392
				1393
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1394	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1395	PyObject table, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1396
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1397	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1398	character mapping table to it and return the resulting Unicode object. Return
				1399	NULL when an exception was raised by the codec.
				1400
				1401	The mapping table must map Unicode ordinal integers to Unicode ordinal
				1402	integers or None (causing deletion of the character).
				1403
				1404	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1405	and sequences work well. Unmapped character ordinals (ones which cause a
				1406	:exc:`LookupError`) are left untouched and are copied as-is.
				1407
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1408	.. deprecated-removed:: 3.3 4.0
				1409	Part of the old-style :c:type:`Py_UNICODE` API.
				1410
				1411	.. XXX replace with what?
Jeroen Ruigrok van der Werven	47a7d70	2009-04-27 05:43:17 +0000	[diff] [blame]	1412
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1413
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1414	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1415	PyObject mapping, const char errors)
				1416
				1417	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
				1418	mapping object and return a Python string object. Return NULL if an
				1419	exception was raised by the codec.
				1420
				1421	.. deprecated-removed:: 3.3 4.0
				1422	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1423	:c:func:`PyUnicode_AsCharmapString`.
				1424
				1425
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1426	MBCS codecs for Windows
				1427	"""""""""""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1428
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1429	These are the MBCS codec APIs. They are currently only available on Windows and
				1430	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				1431	DBCS) is a class of encodings, not just one. The target encoding is defined by
				1432	the user settings on the machine running the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1433
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1434	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1435
				1436	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				1437	Return NULL if an exception was raised by the codec.
				1438
				1439
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1440	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
				1441	const char errors, int consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1442
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1443	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				1444	consumed is not NULL, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1445	trailing lead byte and the number of bytes that have been decoded will be stored
				1446	in consumed.
				1447
				1448
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1449	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				1450
				1451	Encode a Unicode object using MBCS and return the result as Python bytes
				1452	object. Error handling is "strict". Return NULL if an exception was
				1453	raised by the codec.
				1454
				1455
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1456	.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject unicode, const char errors)
				1457
				1458	Encode the Unicode object using the specified code page and return a Python
				1459	bytes object. Return NULL if an exception was raised by the codec. Use
				1460	:c:data:`CP_ACP` code page to get the MBCS encoder.
				1461
				1462	.. versionadded:: 3.3
				1463
				1464
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1465	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1466
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1467	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1468	a Python bytes object. Return NULL if an exception was raised by the
				1469	codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1470
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1471	.. deprecated-removed:: 3.3 4.0
				1472	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1473	:c:func:`PyUnicode_AsMBCSString` or :c:func:`PyUnicode_EncodeCodePage`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1474
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1475
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1476	Methods & Slots
				1477	"""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1478
				1479
				1480	.. _unicodemethodsandslots:
				1481
				1482	Methods and Slot Functions
				1483	^^^^^^^^^^^^^^^^^^^^^^^^^^
				1484
				1485	The following APIs are capable of handling Unicode objects and strings on input
				1486	(we refer to them as strings in the descriptions) and return Unicode objects or
				1487	integers as appropriate.
				1488
				1489	They all return NULL or ``-1`` if an exception occurs.
				1490
				1491
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1492	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1493
				1494	Concat two strings giving a new Unicode string.
				1495
				1496
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1497	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1498
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1499	Split a string giving a list of Unicode strings. If sep is NULL, splitting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1500	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1501	separator. At most maxsplit splits will be done. If negative, no limit is
				1502	set. Separators are not included in the resulting list.
				1503
				1504
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1505	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1506
				1507	Split a Unicode string at line breaks, returning a list of Unicode strings.
				1508	CRLF is considered to be one line break. If keepend is 0, the Line break
				1509	characters are not included in the resulting strings.
				1510
				1511
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1512	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, \
				1513	const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1514
				1515	Translate a string by applying a character mapping table to it and return the
				1516	resulting Unicode object.
				1517
				1518	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1519	or None (causing deletion of the character).
				1520
				1521	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1522	and sequences work well. Unmapped character ordinals (ones which cause a
				1523	:exc:`LookupError`) are left untouched and are copied as-is.
				1524
				1525	errors has the usual meaning for codecs. It may be NULL which indicates to
				1526	use the default error handling.
				1527
				1528
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1529	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1530
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1531	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1532	Unicode string.
				1533
				1534
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1535	.. c:function:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, \
				1536	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1537
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1538	Return 1 if substr matches ``str[start:end]`` at the given tail end
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1539	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				1540	0 otherwise. Return ``-1`` if an error occurred.
				1541
				1542
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1543	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, \
				1544	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1545
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1546	Return the first position of substr in ``str[start:end]`` using the given
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1547	direction (direction == 1 means to do a forward search, direction == -1 a
				1548	backward search). The return value is the index of the first match; a value of
				1549	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1550	occurred and an exception has been set.
				1551
				1552
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1553	.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
				1554	Py_ssize_t start, Py_ssize_t end, int direction)
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1555
				1556	Return the first position of the character ch in ``str[start:end]`` using
				1557	the given direction (direction == 1 means to do a forward search,
				1558	direction == -1 a backward search). The return value is the index of the
				1559	first match; a value of ``-1`` indicates that no match was found, and ``-2``
				1560	indicates that an error occurred and an exception has been set.
				1561
Georg Brandl	ee12f44	2011-09-28 21:51:06 +0200	[diff] [blame]	1562	.. versionadded:: 3.3
				1563
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1564
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1565	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, \
				1566	Py_ssize_t start, Py_ssize_t end)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1567
				1568	Return the number of non-overlapping occurrences of substr in
				1569	``str[start:end]``. Return ``-1`` if an error occurred.
				1570
				1571
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1572	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, \
				1573	PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1574
				1575	Replace at most maxcount occurrences of substr in str with replstr and
				1576	return the resulting Unicode object. maxcount == -1 means replace all
				1577	occurrences.
				1578
				1579
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1580	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1581
				1582	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				1583	respectively.
				1584
				1585
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1586	.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject uni, char string)
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1587
				1588	Compare a unicode object, uni, with string and return -1, 0, 1 for less
Victor Stinner	80e788a	2010-12-28 23:39:51 +0000	[diff] [blame]	1589	than, equal, and greater than, respectively. It is best to pass only
				1590	ASCII-encoded strings, but the function interprets the input string as
				1591	ISO-8859-1 if it contains non-ASCII characters".
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1592
				1593
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1594	.. c:function:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1595
				1596	Rich compare two unicode strings and return one of the following:
				1597
				1598	* ``NULL`` in case an exception was raised
				1599	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1600	* :const:`Py_NotImplemented` in case the type combination is unknown
				1601
				1602	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				1603	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				1604	with a :exc:`UnicodeDecodeError`.
				1605
				1606	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1607	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1608
				1609
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1610	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1611
				1612	Return a new string object from format and args; this is analogous to
				1613	``format % args``. The args argument must be a tuple.
				1614
				1615
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1616	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1617
				1618	Check whether element is contained in container and return true or false
				1619	accordingly.
				1620
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1621	element has to coerce to a one element Unicode string. ``-1`` is returned
				1622	if there was an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1623
				1624
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1625	.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1626
				1627	Intern the argument \string* in place. The argument must be the address of a
				1628	pointer variable pointing to a Python unicode string object. If there is an
				1629	existing interned string that is the same as \string, it sets \string to
				1630	it (decrementing the reference count of the old string object and incrementing
				1631	the reference count of the interned string object), otherwise it leaves
				1632	\string* alone and interns it (incrementing its reference count).
				1633	(Clarification: even though there is a lot of talk about reference counts, think
				1634	of this function as reference-count-neutral; you own the object after the call
				1635	if and only if you owned it before the call.)
				1636
				1637
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1638	.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1639
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1640	A combination of :c:func:`PyUnicode_FromString` and
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1641	:c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
				1642	object that has been interned, or a new ("owned") reference to an earlier
				1643	interned string object with the same value.