Blame - Doc/c-api/unicode.rst - platform/external/python/cpython3

blob: b7f99d32558b89f0d2504b649bd54d3b9c67c908 [file] [log] [blame]

Stéphane Wirtel	cbb6484	2019-05-17 11:55:34 +0200	[diff] [blame]	1	.. highlight:: c
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
Antoine Pitrou	fbd4f80	2012-08-11 16:51:50 +0200	[diff] [blame]	8	.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	9	.. sectionauthor:: Georg Brandl <georg@python.org>
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	10
				11	Unicode Objects
				12	^^^^^^^^^^^^^^^
				13
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	14	Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
				15	use a variety of representations, in order to allow handling the complete range
				16	of Unicode characters while staying memory efficient. There are special cases
				17	for strings where all code points are below 128, 256, or 65536; otherwise, code
				18	points must be below 1114112 (which is the full Unicode range).
				19
				20	:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	21	in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
				22	and inefficient; it should be avoided in performance- or memory-sensitive
				23	situations.
				24
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	25	Due to the transition between the old APIs and the new APIs, Unicode objects
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	26	can internally be in two states depending on how they were created:
				27
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	28	* "canonical" Unicode objects are all objects created by a non-deprecated
				29	Unicode API. They use the most efficient representation allowed by the
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	30	implementation.
				31
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	32	* "legacy" Unicode objects have been created through one of the deprecated
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	33	APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
				34	:c:type:`Py_UNICODE*` representation; you will have to call
				35	:c:func:`PyUnicode_READY` on them before calling any other API.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	36
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	37	.. note::
				38	The "legacy" Unicode object will be removed in Python 3.12 with deprecated
				39	APIs. All Unicode objects will be "canonical" since then. See :pep:`623`
				40	for more information.
				41
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	42
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	43	Unicode Type
				44	""""""""""""
				45
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	46	These are the basic Unicode object types used for the Unicode implementation in
				47	Python:
				48
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	49	.. c:type:: Py_UCS4
				50	Py_UCS2
				51	Py_UCS1
				52
				53	These types are typedefs for unsigned integer types wide enough to contain
				54	characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
				55	single Unicode characters, use :c:type:`Py_UCS4`.
				56
				57	.. versionadded:: 3.3
				58
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	59
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	60	.. c:type:: Py_UNICODE
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	61
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	62	This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
				63	depending on the platform.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	64
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	65	.. versionchanged:: 3.3
				66	In previous versions, this was a 16-bit type or a 32-bit type depending on
				67	whether you selected a "narrow" or "wide" Unicode version of Python at
				68	build time.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	69
				70
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	71	.. c:type:: PyASCIIObject
				72	PyCompactUnicodeObject
				73	PyUnicodeObject
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	74
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	75	These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
				76	almost all cases, they shouldn't be used directly, since all API functions
				77	that deal with Unicode objects take and return :c:type:`PyObject` pointers.
				78
				79	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	80
				81
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	82	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	83
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	84	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	85	is exposed to Python code as ``str``.
				86
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	87
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	88	The following APIs are really C macros and can be used to do fast checks and to
				89	access internal read-only data of Unicode objects:
				90
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	91	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	92
				93	Return true if the object o is a Unicode object or an instance of a Unicode
				94	subtype.
				95
				96
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	97	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	98
				99	Return true if the object o is a Unicode object, but not an instance of a
				100	subtype.
				101
				102
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	103	.. c:function:: int PyUnicode_READY(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	104
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	105	Ensure the string object o is in the "canonical" representation. This is
				106	required before using any of the access macros described below.
				107
				108	.. XXX expand on when it is not required
				109
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	110	Returns ``0`` on success and ``-1`` with an exception set on failure, which in
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	111	particular happens if memory allocation fails.
				112
				113	.. versionadded:: 3.3
				114
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	115	.. deprecated-removed:: 3.10 3.12
				116	This API will be removed with :c:func:`PyUnicode_FromUnicode`.
				117
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	118
				119	.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
				120
				121	Return the length of the Unicode string, in code points. o has to be a
				122	Unicode object in the "canonical" representation (not checked).
				123
				124	.. versionadded:: 3.3
				125
				126
				127	.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
				128	Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
				129	Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
				130
				131	Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
				132	integer types for direct character access. No checks are performed if the
				133	canonical representation has the correct character size; use
Martin v. Löwis	2da16e6	2011-10-07 20:58:00 +0200	[diff] [blame]	134	:c:func:`PyUnicode_KIND` to select the right macro. Make sure
Martin v. Löwis	c47adb0	2011-10-07 20:55:35 +0200	[diff] [blame]	135	:c:func:`PyUnicode_READY` has been called before accessing this.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	136
				137	.. versionadded:: 3.3
				138
				139
Victor Stinner	b4938aa	2011-11-20 18:27:28 +0100	[diff] [blame]	140	.. c:macro:: PyUnicode_WCHAR_KIND
				141	PyUnicode_1BYTE_KIND
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	142	PyUnicode_2BYTE_KIND
				143	PyUnicode_4BYTE_KIND
				144
				145	Return values of the :c:func:`PyUnicode_KIND` macro.
				146
				147	.. versionadded:: 3.3
				148
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	149	.. deprecated-removed:: 3.10 3.12
				150	``PyUnicode_WCHAR_KIND`` is deprecated.
				151
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	152
				153	.. c:function:: int PyUnicode_KIND(PyObject *o)
				154
				155	Return one of the PyUnicode kind constants (see above) that indicate how many
				156	bytes per character this Unicode object uses to store its data. o has to
				157	be a Unicode object in the "canonical" representation (not checked).
				158
				159	.. XXX document "0" return value?
				160
				161	.. versionadded:: 3.3
				162
				163
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	164	.. c:function:: void* PyUnicode_DATA(PyObject *o)
				165
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	166	Return a void pointer to the raw Unicode buffer. o has to be a Unicode
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	167	object in the "canonical" representation (not checked).
				168
				169	.. versionadded:: 3.3
				170
				171
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	172	.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
				173	Py_UCS4 value)
				174
				175	Write into a canonical representation data (as obtained with
				176	:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
				177	intended for usage in loops. The caller should cache the kind value and
				178	data pointer as obtained from other macro calls. index is the index in
				179	the string (starts at 0) and value is the new code point value which should
				180	be written to that location.
				181
				182	.. versionadded:: 3.3
				183
				184
				185	.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
				186
				187	Read a code point from a canonical representation data (as obtained with
				188	:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
				189
				190	.. versionadded:: 3.3
				191
				192
				193	.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
				194
				195	Read a character from a Unicode object o, which must be in the "canonical"
				196	representation. This is less efficient than :c:func:`PyUnicode_READ` if you
				197	do multiple consecutive reads.
				198
				199	.. versionadded:: 3.3
				200
				201
Victor Stinner	474652f	2020-08-13 22:11:50 +0200	[diff] [blame]	202	.. c:macro:: PyUnicode_MAX_CHAR_VALUE(o)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	203
				204	Return the maximum code point that is suitable for creating another string
				205	based on o, which must be in the "canonical" representation. This is
				206	always an approximation but more efficient than iterating over the string.
				207
				208	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	209
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	210
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	211	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				212
				213	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				214	code units (this includes surrogate pairs as 2 units). o has to be a
				215	Unicode object (not checked).
				216
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	217	.. deprecated-removed:: 3.3 3.12
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	218	Part of the old-style Unicode API, please migrate to using
				219	:c:func:`PyUnicode_GET_LENGTH`.
				220
				221
				222	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				223
				224	Return the size of the deprecated :c:type:`Py_UNICODE` representation in
				225	bytes. o has to be a Unicode object (not checked).
				226
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	227	.. deprecated-removed:: 3.3 3.12
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	228	Part of the old-style Unicode API, please migrate to using
				229	:c:func:`PyUnicode_GET_LENGTH`.
				230
				231
				232	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				233	const char* PyUnicode_AS_DATA(PyObject *o)
				234
				235	Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	236	returned buffer is always terminated with an extra null code point. It
				237	may also contain embedded null code points, which would cause the string
				238	to be truncated when used in most C functions. The ``AS_DATA`` form
				239	casts the pointer to :c:type:`const char `. The o* argument has to be
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	240	a Unicode object (not checked).
				241
				242	.. versionchanged:: 3.3
				243	This macro is now inefficient -- because in many cases the
				244	:c:type:`Py_UNICODE` representation does not exist and needs to be created
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	245	-- and can fail (return ``NULL`` with an exception set). Try to port the
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	246	code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
				247	:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
				248
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	249	.. deprecated-removed:: 3.3 3.12
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	250	Part of the old-style Unicode API, please migrate to using the
				251	:c:func:`PyUnicode_nBYTE_DATA` family of macros.
				252
				253
Victor Stinner	f3e7ea5	2020-02-11 14:29:33 +0100	[diff] [blame]	254	.. c:function:: int PyUnicode_IsIdentifier(PyObject *o)
				255
				256	Return ``1`` if the string is a valid identifier according to the language
				257	definition, section :ref:`identifiers`. Return ``0`` otherwise.
				258
				259	.. versionchanged:: 3.9
				260	The function does not call :c:func:`Py_FatalError` anymore if the string
				261	is not ready.
				262
				263
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	264	Unicode Character Properties
				265	""""""""""""""""""""""""""""
				266
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	267	Unicode provides many different character properties. The most often needed ones
				268	are available through these macros which are mapped to C functions depending on
				269	the Python configuration.
				270
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	271
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	272	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	273
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	274	Return ``1`` or ``0`` depending on whether ch is a whitespace character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	275
				276
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	277	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	278
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	279	Return ``1`` or ``0`` depending on whether ch is a lowercase character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	280
				281
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	282	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	283
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	284	Return ``1`` or ``0`` depending on whether ch is an uppercase character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	285
				286
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	287	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	288
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	289	Return ``1`` or ``0`` depending on whether ch is a titlecase character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	290
				291
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	292	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	293
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	294	Return ``1`` or ``0`` depending on whether ch is a linebreak character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	295
				296
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	297	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	298
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	299	Return ``1`` or ``0`` depending on whether ch is a decimal character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	300
				301
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	302	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	303
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	304	Return ``1`` or ``0`` depending on whether ch is a digit character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	305
				306
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	307	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	308
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	309	Return ``1`` or ``0`` depending on whether ch is a numeric character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	310
				311
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	312	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	313
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	314	Return ``1`` or ``0`` depending on whether ch is an alphabetic character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	315
				316
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	317	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	318
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	319	Return ``1`` or ``0`` depending on whether ch is an alphanumeric character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	320
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	321
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	322	.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	323
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	324	Return ``1`` or ``0`` depending on whether ch is a printable character.
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	325	Nonprintable characters are those characters defined in the Unicode character
				326	database as "Other" or "Separator", excepting the ASCII space (0x20) which is
				327	considered printable. (Note that printable characters in this context are
				328	those which should not be escaped when :func:`repr` is invoked on a string.
				329	It has no bearing on the handling of strings written to :data:`sys.stdout` or
				330	:data:`sys.stderr`.)
				331
				332
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	333	These APIs can be used for fast direct character conversions:
				334
				335
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	336	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	337
				338	Return the character ch converted to lower case.
				339
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	340	.. deprecated:: 3.3
				341	This function uses simple case mappings.
				342
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	343
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	344	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	345
				346	Return the character ch converted to upper case.
				347
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	348	.. deprecated:: 3.3
				349	This function uses simple case mappings.
				350
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	351
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	352	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	353
				354	Return the character ch converted to title case.
				355
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	356	.. deprecated:: 3.3
				357	This function uses simple case mappings.
				358
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	359
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	360	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	361
				362	Return the character ch converted to a decimal positive integer. Return
				363	``-1`` if this is not possible. This macro does not raise exceptions.
				364
				365
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	366	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	367
				368	Return the character ch converted to a single digit integer. Return ``-1`` if
				369	this is not possible. This macro does not raise exceptions.
				370
				371
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	372	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	373
				374	Return the character ch converted to a double. Return ``-1.0`` if this is not
				375	possible. This macro does not raise exceptions.
				376
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	377
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	378	These APIs can be used to work with surrogates:
				379
				380	.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
				381
				382	Check if ch is a surrogate (``0xD800 <= ch <= 0xDFFF``).
				383
				384	.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
				385
Serhiy Storchaka	6a7b3a7	2016-04-17 08:32:47 +0300	[diff] [blame]	386	Check if ch is a high surrogate (``0xD800 <= ch <= 0xDBFF``).
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	387
				388	.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
				389
				390	Check if ch is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
				391
				392	.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
				393
				394	Join two surrogate characters and return a single Py_UCS4 value.
				395	high and low are respectively the leading and trailing surrogates in a
				396	surrogate pair.
				397
				398
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	399	Creating and accessing Unicode strings
				400	""""""""""""""""""""""""""""""""""""""
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	401
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	402	To create Unicode objects and access their basic sequence properties, use these
				403	APIs:
				404
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	405	.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	406
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	407	Create a new Unicode object. maxchar should be the true maximum code point
				408	to be placed in the string. As an approximation, it can be rounded up to the
				409	nearest value in the sequence 127, 255, 65535, 1114111.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	410
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	411	This is the recommended way to allocate a new Unicode object. Objects
				412	created using this function are not resizable.
				413
				414	.. versionadded:: 3.3
				415
				416
				417	.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
				418	Py_ssize_t size)
				419
				420	Create a new Unicode object with the given kind (possible values are
				421	:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
				422	:c:func:`PyUnicode_KIND`). The buffer must point to an array of size
				423	units of 1, 2 or 4 bytes per character, as given by the kind.
				424
				425	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	426
				427
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	428	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	429
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	430	Create a Unicode object from the char buffer u. The bytes will be
				431	interpreted as being UTF-8 encoded. The buffer is copied into the new
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	432	object. If the buffer is not ``NULL``, the return value might be a shared
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	433	object, i.e. modification of the data is not allowed.
				434
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	435	If u is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
				436	with the buffer set to ``NULL``. This usage is deprecated in favor of
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	437	:c:func:`PyUnicode_New`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	438
				439
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	440	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	441
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	442	Create a Unicode object from a UTF-8 encoded null-terminated char buffer
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	443	u.
				444
				445
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	446	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	447
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	448	Take a C :c:func:`printf`\ -style format string and a variable number of
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	449	arguments, calculate the size of the resulting Python Unicode string and return
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	450	a string with the values formatted into it. The variable arguments must be C
				451	types and must correspond exactly to the format characters in the format
Victor Stinner	1205f27	2010-09-11 00:54:47 +0000	[diff] [blame]	452	ASCII-encoded string. The following format characters are allowed:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	453
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	454	.. % This should be exactly the same as the table in PyErr_Format.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	455	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				456	.. % because not all compilers support the %z width modifier -- we fake it
				457	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	458	.. % Similar comments apply to the %ll width modifier and
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	459
Georg Brandl	44ea77b	2013-03-28 13:28:44 +0100	[diff] [blame]	460	.. tabularcolumns:: \|l\|l\|L\|
				461
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	462	+-------------------+---------------------+----------------------------------+
				463	\| Format Characters \| Type \| Comment \|
				464	+===================+=====================+==================================+
				465	\| :attr:`%%` \| n/a \| The literal % character. \|
				466	+-------------------+---------------------+----------------------------------+
				467	\| :attr:`%c` \| int \| A single character, \|
				468	\| \| \| represented as a C int. \|
				469	+-------------------+---------------------+----------------------------------+
				470	\| :attr:`%d` \| int \| Equivalent to \|
				471	\| \| \| ``printf("%d")``. [1]_ \|
				472	+-------------------+---------------------+----------------------------------+
				473	\| :attr:`%u` \| unsigned int \| Equivalent to \|
				474	\| \| \| ``printf("%u")``. [1]_ \|
				475	+-------------------+---------------------+----------------------------------+
				476	\| :attr:`%ld` \| long \| Equivalent to \|
				477	\| \| \| ``printf("%ld")``. [1]_ \|
				478	+-------------------+---------------------+----------------------------------+
				479	\| :attr:`%li` \| long \| Equivalent to \|
				480	\| \| \| ``printf("%li")``. [1]_ \|
				481	+-------------------+---------------------+----------------------------------+
				482	\| :attr:`%lu` \| unsigned long \| Equivalent to \|
				483	\| \| \| ``printf("%lu")``. [1]_ \|
				484	+-------------------+---------------------+----------------------------------+
				485	\| :attr:`%lld` \| long long \| Equivalent to \|
				486	\| \| \| ``printf("%lld")``. [1]_ \|
				487	+-------------------+---------------------+----------------------------------+
				488	\| :attr:`%lli` \| long long \| Equivalent to \|
				489	\| \| \| ``printf("%lli")``. [1]_ \|
				490	+-------------------+---------------------+----------------------------------+
				491	\| :attr:`%llu` \| unsigned long long \| Equivalent to \|
				492	\| \| \| ``printf("%llu")``. [1]_ \|
				493	+-------------------+---------------------+----------------------------------+
				494	\| :attr:`%zd` \| Py_ssize_t \| Equivalent to \|
				495	\| \| \| ``printf("%zd")``. [1]_ \|
				496	+-------------------+---------------------+----------------------------------+
				497	\| :attr:`%zi` \| Py_ssize_t \| Equivalent to \|
				498	\| \| \| ``printf("%zi")``. [1]_ \|
				499	+-------------------+---------------------+----------------------------------+
				500	\| :attr:`%zu` \| size_t \| Equivalent to \|
				501	\| \| \| ``printf("%zu")``. [1]_ \|
				502	+-------------------+---------------------+----------------------------------+
				503	\| :attr:`%i` \| int \| Equivalent to \|
				504	\| \| \| ``printf("%i")``. [1]_ \|
				505	+-------------------+---------------------+----------------------------------+
				506	\| :attr:`%x` \| int \| Equivalent to \|
				507	\| \| \| ``printf("%x")``. [1]_ \|
				508	+-------------------+---------------------+----------------------------------+
				509	\| :attr:`%s` \| const char\* \| A null-terminated C character \|
				510	\| \| \| array. \|
				511	+-------------------+---------------------+----------------------------------+
				512	\| :attr:`%p` \| const void\* \| The hex representation of a C \|
				513	\| \| \| pointer. Mostly equivalent to \|
				514	\| \| \| ``printf("%p")`` except that \|
				515	\| \| \| it is guaranteed to start with \|
				516	\| \| \| the literal ``0x`` regardless \|
				517	\| \| \| of what the platform's \|
				518	\| \| \| ``printf`` yields. \|
				519	+-------------------+---------------------+----------------------------------+
				520	\| :attr:`%A` \| PyObject\* \| The result of calling \|
				521	\| \| \| :func:`ascii`. \|
				522	+-------------------+---------------------+----------------------------------+
				523	\| :attr:`%U` \| PyObject\* \| A Unicode object. \|
				524	+-------------------+---------------------+----------------------------------+
				525	\| :attr:`%V` \| PyObject\*, \| A Unicode object (which may be \|
				526	\| \| const char\* \| ``NULL``) and a null-terminated \|
				527	\| \| \| C character array as a second \|
				528	\| \| \| parameter (which will be used, \|
				529	\| \| \| if the first parameter is \|
				530	\| \| \| ``NULL``). \|
				531	+-------------------+---------------------+----------------------------------+
				532	\| :attr:`%S` \| PyObject\* \| The result of calling \|
				533	\| \| \| :c:func:`PyObject_Str`. \|
				534	+-------------------+---------------------+----------------------------------+
				535	\| :attr:`%R` \| PyObject\* \| The result of calling \|
				536	\| \| \| :c:func:`PyObject_Repr`. \|
				537	+-------------------+---------------------+----------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	538
				539	An unrecognized format character causes all the rest of the format string to be
				540	copied as-is to the result string, and any extra arguments discarded.
				541
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	542	.. note::
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	543	The width formatter unit is number of characters rather than bytes.
				544	The precision formatter unit is number of bytes for ``"%s"`` and
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	545	``"%V"`` (if the ``PyObject*`` argument is ``NULL``), and a number of
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	546	characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	547	(if the ``PyObject*`` argument is not ``NULL``).
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	548
Louie Lu	88c38b3	2017-04-27 11:36:35 +0800	[diff] [blame]	549	.. [1] For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
				550	zu, i, x): the 0-conversion flag has effect even when a precision is given.
				551
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	552	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	553	Support for ``"%lld"`` and ``"%llu"`` added.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	554
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	555	.. versionchanged:: 3.3
				556	Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
				557
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	558	.. versionchanged:: 3.4
				559	Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
				560	``"%V"``, ``"%S"``, ``"%R"`` added.
				561
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	562
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	563	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	564
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	565	Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	566	arguments.
				567
Alexander Belopolsky	942af5a	2010-12-04 03:38:46 +0000	[diff] [blame]	568
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	569	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
				570	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	571
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	572	Decode an encoded object obj to a Unicode object.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	573
Serhiy Storchaka	b757c83	2014-12-05 22:25:22 +0200	[diff] [blame]	574	:class:`bytes`, :class:`bytearray` and other
				575	:term:`bytes-like objects <bytes-like object>`
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	576	are decoded according to the given encoding and using the error handling
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	577	defined by errors. Both can be ``NULL`` to have the interface use the default
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	578	values (see :ref:`builtincodecs` for details).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	579
				580	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				581	set.
				582
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	583	The API returns ``NULL`` if there was an error. The caller is responsible for
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	584	decref'ing the returned objects.
				585
				586
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	587	.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
				588
				589	Return the length of the Unicode object, in code points.
				590
				591	.. versionadded:: 3.3
				592
				593
Serhiy Storchaka	9c0e1f8	2016-10-08 22:45:38 +0300	[diff] [blame]	594	.. c:function:: Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, \
				595	Py_ssize_t to_start, \
				596	PyObject *from, \
				597	Py_ssize_t from_start, \
				598	Py_ssize_t how_many)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	599
				600	Copy characters from one Unicode object into another. This function performs
				601	character conversion when necessary and falls back to :c:func:`memcpy` if
				602	possible. Returns ``-1`` and sets an exception on error, otherwise returns
Serhiy Storchaka	9c0e1f8	2016-10-08 22:45:38 +0300	[diff] [blame]	603	the number of copied characters.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	604
				605	.. versionadded:: 3.3
				606
				607
Victor Stinner	606e19d	2012-01-04 03:59:16 +0100	[diff] [blame]	608	.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
Victor Stinner	3fe5531	2012-01-04 00:33:50 +0100	[diff] [blame]	609	Py_ssize_t length, Py_UCS4 fill_char)
				610
				611	Fill a string with a character: write fill_char into
				612	``unicode[start:start+length]``.
				613
				614	Fail if fill_char is bigger than the string maximum character, or if the
				615	string has more than 1 reference.
				616
				617	Return the number of written character, or return ``-1`` and raise an
				618	exception on error.
				619
				620	.. versionadded:: 3.3
				621
				622
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	623	.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
				624	Py_UCS4 character)
				625
				626	Write a character to a string. The string must have been created through
				627	:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
				628	the string must not be shared, or have been hashed yet.
				629
				630	This function checks that unicode is a Unicode object, that the index is
				631	not out of bounds, and that the object can be modified safely (i.e. that it
Berker Peksag	544ae59	2016-04-24 03:06:44 +0300	[diff] [blame]	632	its reference count is one).
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	633
				634	.. versionadded:: 3.3
				635
				636
				637	.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
				638
				639	Read a character from a string. This function checks that unicode is a
				640	Unicode object and the index is not out of bounds, in contrast to the macro
				641	version :c:func:`PyUnicode_READ_CHAR`.
				642
				643	.. versionadded:: 3.3
				644
				645
				646	.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
				647	Py_ssize_t end)
				648
				649	Return a substring of str, from character index start (included) to
				650	character index end (excluded). Negative indices are not supported.
				651
				652	.. versionadded:: 3.3
				653
				654
				655	.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject u, Py_UCS4 buffer, \
				656	Py_ssize_t buflen, int copy_null)
				657
				658	Copy the string u into a UCS4 buffer, including a null character, if
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	659	copy_null is set. Returns ``NULL`` and sets an exception on error (in
Serhiy Storchaka	cc16423	2016-10-02 21:29:26 +0300	[diff] [blame]	660	particular, a :exc:`SystemError` if buflen is smaller than the length of
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	661	u). buffer is returned on success.
				662
				663	.. versionadded:: 3.3
				664
				665
				666	.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
				667
				668	Copy the string u into a new UCS4 buffer that is allocated using
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	669	:c:func:`PyMem_Malloc`. If this fails, ``NULL`` is returned with a
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	670	:exc:`MemoryError` set. The returned buffer always has an extra
				671	null code point appended.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	672
				673	.. versionadded:: 3.3
				674
				675
				676	Deprecated Py_UNICODE APIs
				677	""""""""""""""""""""""""""
				678
				679	.. deprecated-removed:: 3.3 4.0
				680
				681	These API functions are deprecated with the implementation of :pep:`393`.
				682	Extension modules can continue using them, as they will not be removed in Python
				683	3.x, but need to be aware that their use can now cause performance and memory hits.
				684
				685
				686	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				687
				688	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	689	may be ``NULL`` which causes the contents to be undefined. It is the user's
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	690	responsibility to fill in the needed data. The buffer is copied into the new
				691	object.
				692
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	693	If the buffer is not ``NULL``, the return value might be a shared object.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	694	Therefore, modification of the resulting Unicode object is only allowed when
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	695	u is ``NULL``.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	696
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	697	If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	698	string content has been filled before using any of the access macros such as
				699	:c:func:`PyUnicode_KIND`.
				700
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	701	.. deprecated-removed:: 3.3 3.12
				702	Part of the old-style Unicode API, please migrate to using
				703	:c:func:`PyUnicode_FromKindAndData`, :c:func:`PyUnicode_FromWideChar`, or
				704	:c:func:`PyUnicode_New`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	705
				706
				707	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				708
				709	Return a read-only pointer to the Unicode object's internal
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	710	:c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	711	:c:type:`Py_UNICODE*` representation of the object if it is not yet
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	712	available. The buffer is always terminated with an extra null code point.
				713	Note that the resulting :c:type:`Py_UNICODE` string may also contain
				714	embedded null code points, which would cause the string to be truncated when
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	715	used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	716
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	717	.. deprecated-removed:: 3.3 3.12
				718	Part of the old-style Unicode API, please migrate to using
				719	:c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
				720	:c:func:`PyUnicode_ReadChar` or similar new APIs.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	721
				722
				723	.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
				724
				725	Create a Unicode object by replacing all decimal digits in
				726	:c:type:`Py_UNICODE` buffer of the given size by ASCII digits 0--9
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	727	according to their decimal value. Return ``NULL`` if an exception occurs.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	728
				729
				730	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject unicode, Py_ssize_t size)
				731
				732	Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	733	array length (excluding the extra null terminator) in size.
				734	Note that the resulting :c:type:`Py_UNICODE*` string
				735	may contain embedded null code points, which would cause the string to be
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	736	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	737
				738	.. versionadded:: 3.3
				739
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	740	.. deprecated-removed:: 3.3 3.12
				741	Part of the old-style Unicode API, please migrate to using
				742	:c:func:`PyUnicode_AsUCS4`, :c:func:`PyUnicode_AsWideChar`,
				743	:c:func:`PyUnicode_ReadChar` or similar new APIs.
				744
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	745
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	746	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				747
				748	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				749	code units (this includes surrogate pairs as 2 units).
				750
Inada Naoki	270b4ad	2020-08-05 10:48:51 +0900	[diff] [blame]	751	.. deprecated-removed:: 3.3 3.12
				752	Part of the old-style Unicode API, please migrate to using
				753	:c:func:`PyUnicode_GET_LENGTH`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	754
				755
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	756	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	757
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	758	Copy an instance of a Unicode subtype to a new true Unicode object if
				759	necessary. If obj is already a true Unicode object (not a subtype),
				760	return the reference with incremented refcount.
				761
				762	Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	763
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	764
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	765	Locale Encoding
				766	"""""""""""""""
				767
				768	The current locale encoding can be used to decode text from the operating
				769	system.
				770
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	771	.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
				772	Py_ssize_t len, \
				773	const char *errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	774
pxinwr	f4b0a1c	2019-03-04 17:02:06 +0800	[diff] [blame]	775	Decode a string from UTF-8 on Android and VxWorks, or from the current
				776	locale encoding on other platforms. The supported
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	777	error handlers are ``"strict"`` and ``"surrogateescape"``
				778	(:pep:`383`). The decoder uses ``"strict"`` error handler if
Andrew Svetlov	f4c3a18	2012-11-29 15:23:15 +0200	[diff] [blame]	779	errors is ``NULL``. str must end with a null character but
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	780	cannot contain embedded null characters.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	781
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	782	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
				783	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				784	Python startup).
				785
Victor Stinner	4b9aad4	2020-11-02 16:49:54 +0100	[diff] [blame^]	786	This function ignores the :ref:`Python UTF-8 Mode <utf8-mode>`.
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	787
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	788	.. seealso::
				789
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	790	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	791
				792	.. versionadded:: 3.3
				793
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	794	.. versionchanged:: 3.7
				795	The function now also uses the current locale encoding for the
Victor Stinner	9089a26	2018-01-22 19:07:32 +0100	[diff] [blame]	796	``surrogateescape`` error handler, except on Android. Previously, :c:func:`Py_DecodeLocale`
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	797	was used for the ``surrogateescape``, and the current locale encoding was
				798	used for ``strict``.
				799
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	800
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	801	.. c:function:: PyObject* PyUnicode_DecodeLocale(const char str, const char errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	802
				803	Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
				804	length using :c:func:`strlen`.
				805
				806	.. versionadded:: 3.3
				807
				808
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	809	.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject unicode, const char errors)
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	810
pxinwr	f4b0a1c	2019-03-04 17:02:06 +0800	[diff] [blame]	811	Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current
				812	locale encoding on other platforms. The
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	813	supported error handlers are ``"strict"`` and ``"surrogateescape"``
				814	(:pep:`383`). The encoder uses ``"strict"`` error handler if
Berker Peksag	90e0289	2016-10-17 00:45:56 +0300	[diff] [blame]	815	errors is ``NULL``. Return a :class:`bytes` object. unicode cannot
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	816	contain embedded null characters.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	817
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	818	Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
				819	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				820	Python startup).
				821
Victor Stinner	4b9aad4	2020-11-02 16:49:54 +0100	[diff] [blame^]	822	This function ignores the :ref:`Python UTF-8 Mode <utf8-mode>`.
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	823
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	824	.. seealso::
				825
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	826	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	827
				828	.. versionadded:: 3.3
				829
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	830	.. versionchanged:: 3.7
				831	The function now also uses the current locale encoding for the
Victor Stinner	9089a26	2018-01-22 19:07:32 +0100	[diff] [blame]	832	``surrogateescape`` error handler, except on Android. Previously,
				833	:c:func:`Py_EncodeLocale`
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	834	was used for the ``surrogateescape``, and the current locale encoding was
				835	used for ``strict``.
				836
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	837
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	838	File System Encoding
				839	""""""""""""""""""""
				840
				841	To encode and decode file names and other environment strings,
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	842	:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and
				843	:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler
				844	(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during
				845	argument parsing, the ``"O&"`` converter should be used, passing
				846	:c:func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	847
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	848	.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	849
Brett Cannon	ec6ce87	2016-09-06 15:50:29 -0700	[diff] [blame]	850	ParseTuple converter: encode :class:`str` objects -- obtained directly or
				851	through the :class:`os.PathLike` interface -- to :class:`bytes` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	852	:c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
				853	result must be a :c:type:`PyBytesObject*` which must be released when it is
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	854	no longer used.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	855
				856	.. versionadded:: 3.1
				857
Brett Cannon	ec6ce87	2016-09-06 15:50:29 -0700	[diff] [blame]	858	.. versionchanged:: 3.6
				859	Accepts a :term:`path-like object`.
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	860
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	861	To decode file names to :class:`str` during argument parsing, the ``"O&"``
				862	converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the
				863	conversion function:
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	864
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	865	.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	866
Brett Cannon	a571120	2016-09-06 19:36:01 -0700	[diff] [blame]	867	ParseTuple converter: decode :class:`bytes` objects -- obtained either
				868	directly or indirectly through the :class:`os.PathLike` interface -- to
				869	:class:`str` using :c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str`
				870	objects are output as-is. result must be a :c:type:`PyUnicodeObject*` which
				871	must be released when it is no longer used.
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	872
				873	.. versionadded:: 3.2
				874
Brett Cannon	a571120	2016-09-06 19:36:01 -0700	[diff] [blame]	875	.. versionchanged:: 3.6
				876	Accepts a :term:`path-like object`.
				877
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	878
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	879	.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	880
Victor Stinner	4b9aad4	2020-11-02 16:49:54 +0100	[diff] [blame^]	881	Decode a string from the :term:`filesystem encoding and error handler`.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	882
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	883	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				884	locale encoding.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	885
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	886	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				887	locale encoding and cannot be modified later. If you need to decode a string
				888	from the current locale encoding, use
				889	:c:func:`PyUnicode_DecodeLocaleAndSize`.
				890
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	891	.. seealso::
				892
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	893	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	894
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	895	.. versionchanged:: 3.6
				896	Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	897
				898
				899	.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
				900
Victor Stinner	4b9aad4	2020-11-02 16:49:54 +0100	[diff] [blame^]	901	Decode a null-terminated string from the :term:`filesystem encoding and
				902	error handler`.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	903
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	904	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				905	locale encoding.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	906
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	907	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	908
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	909	.. versionchanged:: 3.6
				910	Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	911
				912
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	913	.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	914
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	915	Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	916	:c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	917	:class:`bytes`. Note that the resulting :class:`bytes` object may contain
				918	null bytes.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	919
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	920	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				921	locale encoding.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	922
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	923	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				924	locale encoding and cannot be modified later. If you need to encode a string
				925	to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
				926
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	927	.. seealso::
				928
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	929	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	930
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	931	.. versionadded:: 3.2
				932
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	933	.. versionchanged:: 3.6
				934	Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	935
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	936	wchar_t Support
				937	"""""""""""""""
				938
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	939	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	940
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	941	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	942
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	943	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	944	Passing ``-1`` as the size indicates that the function must itself compute the length,
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	945	using wcslen.
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	946	Return ``NULL`` on failure.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	947
				948
Serhiy Storchaka	57dd79e	2018-12-19 15:31:40 +0200	[diff] [blame]	949	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	950
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	951	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				952	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	953	null termination character). Return the number of :c:type:`wchar_t` characters
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	954	copied or ``-1`` in case of an error. Note that the resulting :c:type:`wchar_t*`
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	955	string may or may not be null-terminated. It is the responsibility of the caller
				956	to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	957	required by the application. Also, note that the :c:type:`wchar_t*` string
				958	might contain null characters, which would cause the string to be truncated
				959	when used with most C functions.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	960
				961
Victor Stinner	beb4135b	2010-10-07 01:02:42 +0000	[diff] [blame]	962	.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject unicode, Py_ssize_t size)
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	963
				964	Convert the Unicode object to a wide character string. The output string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	965	always ends with a null character. If size is not ``NULL``, write the number
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	966	of wide characters (excluding the trailing null termination character) into
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	967	\size*. Note that the resulting :c:type:`wchar_t` string might contain
				968	null characters, which would cause the string to be truncated when used with
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	969	most C functions. If size is ``NULL`` and the :c:type:`wchar_t*` string
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	970	contains null characters a :exc:`ValueError` is raised.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	971
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	972	Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	973	:c:func:`PyMem_Free` to free it) on success. On error, returns ``NULL``
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	974	and \size* is undefined. Raises a :exc:`MemoryError` if memory allocation
				975	is failed.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	976
				977	.. versionadded:: 3.2
				978
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	979	.. versionchanged:: 3.7
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	980	Raises a :exc:`ValueError` if size is ``NULL`` and the :c:type:`wchar_t*`
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	981	string contains null characters.
				982
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	983
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	984	.. _builtincodecs:
				985
				986	Built-in Codecs
				987	^^^^^^^^^^^^^^^
				988
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	989	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	990	these codecs are directly usable via the following functions.
				991
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	992	Many of the following APIs take two arguments encoding and errors, and they
				993	have the same semantics as the ones of the built-in :func:`str` string object
				994	constructor.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	995
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	996	Setting encoding to ``NULL`` causes the default encoding to be used
Eric Wieser	bf15d5b	2020-02-10 23:32:18 +0000	[diff] [blame]	997	which is UTF-8. The file system calls should use
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	998	:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
				999	variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1000	variable should be treated as read-only: on some systems, it will be a
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	1001	pointer to a static string, on others, it will change at run-time
				1002	(such as when the application invokes setlocale).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1003
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1004	Error handling is set by errors which may also be set to ``NULL`` meaning to use
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1005	the default handling defined for the codec. Default error handling for all
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	1006	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1007
				1008	The codecs all use a similar interface. Only deviation from the following
				1009	generic ones are documented for simplicity.
				1010
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1011
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1012	Generic Codecs
				1013	""""""""""""""
				1014
				1015	These are the generic codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1016
				1017
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1018	.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
				1019	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1020
				1021	Create a Unicode object by decoding size bytes of the encoded string s.
				1022	encoding and errors have the same meaning as the parameters of the same name
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1023	in the :func:`str` built-in function. The codec to be used is looked up
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1024	using the Python codec registry. Return ``NULL`` if an exception was raised by
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1025	the codec.
				1026
				1027
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1028	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
				1029	const char encoding, const char errors)
				1030
				1031	Encode a Unicode object and return the result as Python bytes object.
				1032	encoding and errors have the same meaning as the parameters of the same
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1033	name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1034	using the Python codec registry. Return ``NULL`` if an exception was raised by
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1035	the codec.
				1036
				1037
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1038	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
				1039	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1040
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1041	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1042	bytes object. encoding and errors have the same meaning as the
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1043	parameters of the same name in the Unicode :meth:`~str.encode` method. The codec
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1044	to be used is looked up using the Python codec registry. Return ``NULL`` if an
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1045	exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1046
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1047	.. deprecated-removed:: 3.3 4.0
				1048	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1049	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1050
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1051
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1052	UTF-8 Codecs
				1053	""""""""""""
				1054
				1055	These are the UTF-8 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1056
				1057
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1058	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1059
				1060	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1061	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1062
				1063
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1064	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
				1065	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1066
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1067	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF8`. If
				1068	consumed is not ``NULL``, trailing incomplete UTF-8 byte sequences will not be
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1069	treated as an error. Those bytes will not be decoded and the number of bytes
				1070	that have been decoded will be stored in consumed.
				1071
				1072
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1073	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1074
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1075	Encode a Unicode object using UTF-8 and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1076	object. Error handling is "strict". Return ``NULL`` if an exception was
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1077	raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1078
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1079
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1080	.. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject unicode, Py_ssize_t size)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1081
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	1082	Return a pointer to the UTF-8 encoding of the Unicode object, and
				1083	store the size of the encoded representation (in bytes) in size. The
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1084	size argument can be ``NULL``; in this case no size will be stored. The
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	1085	returned buffer always has an extra null byte appended (not included in
				1086	size), regardless of whether there are any other null code points.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1087
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1088	In the case of an error, ``NULL`` is returned with an exception set and no
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1089	size is stored.
				1090
				1091	This caches the UTF-8 representation of the string in the Unicode object, and
				1092	subsequent calls will return a pointer to the same buffer. The caller is not
				1093	responsible for deallocating the buffer.
				1094
				1095	.. versionadded:: 3.3
				1096
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1097	.. versionchanged:: 3.7
				1098	The return type is now ``const char `` rather of ``char ``.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1099
Alex Gaynor	3a8fdb2	2020-10-19 18:17:50 -0400	[diff] [blame]	1100	.. versionchanged:: 3.10
				1101	This function is a part of the :ref:`limited API <stable>`.
				1102
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1103
				1104	.. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1105
				1106	As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
				1107
				1108	.. versionadded:: 3.3
				1109
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1110	.. versionchanged:: 3.7
				1111	The return type is now ``const char `` rather of ``char ``.
				1112
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1113
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1114	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				1115
				1116	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1117	return a Python bytes object. Return ``NULL`` if an exception was raised by
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1118	the codec.
				1119
				1120	.. deprecated-removed:: 3.3 4.0
				1121	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1122	:c:func:`PyUnicode_AsUTF8String`, :c:func:`PyUnicode_AsUTF8AndSize` or
				1123	:c:func:`PyUnicode_AsEncodedString`.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1124
				1125
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1126	UTF-32 Codecs
				1127	"""""""""""""
				1128
				1129	These are the UTF-32 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1130
				1131
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1132	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
				1133	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1134
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1135	Decode size bytes from a UTF-32 encoded buffer string and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1136	corresponding Unicode object. errors (if non-``NULL``) defines the error
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1137	handling. It defaults to "strict".
				1138
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1139	If byteorder is non-``NULL``, the decoder starts decoding using the given byte
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1140	order::
				1141
				1142	*byteorder == -1: little endian
				1143	*byteorder == 0: native order
				1144	*byteorder == 1: big endian
				1145
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1146	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				1147	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1148	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1149	``1``, any byte order mark is copied to the output.
				1150
				1151	After completion, \byteorder* is set to the current byte order at the end
				1152	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1153
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1154	If byteorder is ``NULL``, the codec starts in native order mode.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1155
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1156	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1157
				1158
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1159	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
				1160	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1161
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1162	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				1163	consumed is not ``NULL``, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1164	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				1165	by four) as an error. Those bytes will not be decoded and the number of bytes
				1166	that have been decoded will be stored in consumed.
				1167
				1168
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1169	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				1170
				1171	Return a Python byte string using the UTF-32 encoding in native byte
				1172	order. The string always starts with a BOM mark. Error handling is "strict".
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1173	Return ``NULL`` if an exception was raised by the codec.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1174
				1175
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1176	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
				1177	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1178
				1179	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1180	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1181
				1182	byteorder == -1: little endian
				1183	byteorder == 0: native byte order (writes a BOM mark)
				1184	byteorder == 1: big endian
				1185
				1186	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1187	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1188
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	1189	If ``Py_UNICODE_WIDE`` is not defined, surrogate pairs will be output
Georg Brandl	3be472b	2015-01-14 08:26:30 +0100	[diff] [blame]	1190	as a single code point.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1191
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1192	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1193
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1194	.. deprecated-removed:: 3.3 4.0
				1195	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1196	:c:func:`PyUnicode_AsUTF32String` or :c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1197
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1198
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1199	UTF-16 Codecs
				1200	"""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1201
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1202	These are the UTF-16 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1203
				1204
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1205	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
				1206	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1207
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1208	Decode size bytes from a UTF-16 encoded buffer string and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1209	corresponding Unicode object. errors (if non-``NULL``) defines the error
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1210	handling. It defaults to "strict".
				1211
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1212	If byteorder is non-``NULL``, the decoder starts decoding using the given byte
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1213	order::
				1214
				1215	*byteorder == -1: little endian
				1216	*byteorder == 0: native order
				1217	*byteorder == 1: big endian
				1218
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1219	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				1220	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1221	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1222	``1``, any byte order mark is copied to the output (where it will result in
				1223	either a ``\ufeff`` or a ``\ufffe`` character).
				1224
				1225	After completion, \byteorder* is set to the current byte order at the end
				1226	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1227
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1228	If byteorder is ``NULL``, the codec starts in native order mode.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1229
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1230	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1231
				1232
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1233	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
				1234	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1235
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1236	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				1237	consumed is not ``NULL``, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1238	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				1239	split surrogate pair) as an error. Those bytes will not be decoded and the
				1240	number of bytes that have been decoded will be stored in consumed.
				1241
				1242
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1243	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				1244
				1245	Return a Python byte string using the UTF-16 encoding in native byte
				1246	order. The string always starts with a BOM mark. Error handling is "strict".
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1247	Return ``NULL`` if an exception was raised by the codec.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1248
				1249
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1250	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
				1251	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1252
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1253	Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1254	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1255
				1256	byteorder == -1: little endian
				1257	byteorder == 0: native byte order (writes a BOM mark)
				1258	byteorder == 1: big endian
				1259
				1260	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1261	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1262
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	1263	If ``Py_UNICODE_WIDE`` is defined, a single :c:type:`Py_UNICODE` value may get
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1264	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	1265	values is interpreted as a UCS-2 character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1266
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1267	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1268
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1269	.. deprecated-removed:: 3.3 4.0
				1270	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1271	:c:func:`PyUnicode_AsUTF16String` or :c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1272
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1273
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1274	UTF-7 Codecs
				1275	""""""""""""
				1276
				1277	These are the UTF-7 codec APIs:
				1278
				1279
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1280	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1281
				1282	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1283	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1284
				1285
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1286	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
				1287	const char errors, Py_ssize_t consumed)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1288
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1289	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF7`. If
				1290	consumed is not ``NULL``, trailing incomplete UTF-7 base-64 sections will not
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1291	be treated as an error. Those bytes will not be decoded and the number of
				1292	bytes that have been decoded will be stored in consumed.
				1293
				1294
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1295	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
				1296	int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1297
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1298	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1299	return a Python bytes object. Return ``NULL`` if an exception was raised by
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1300	the codec.
				1301
				1302	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				1303	special meaning) will be encoded in base-64. If base64WhiteSpace is
				1304	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				1305	Python "utf-7" codec.
				1306
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1307	.. deprecated-removed:: 3.3 4.0
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1308	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1309	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1310
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1311
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1312	Unicode-Escape Codecs
				1313	"""""""""""""""""""""
				1314
				1315	These are the "Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1316
				1317
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1318	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
				1319	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1320
				1321	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1322	string s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1323
				1324
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1325	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				1326
Serhiy Storchaka	cf36835	2016-11-20 17:20:19 +0200	[diff] [blame]	1327	Encode a Unicode object using Unicode-Escape and return the result as a
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1328	bytes object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1329	raised by the codec.
				1330
				1331
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1332	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1333
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1334	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1335	return a bytes object. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1336
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1337	.. deprecated-removed:: 3.3 4.0
				1338	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1339	:c:func:`PyUnicode_AsUnicodeEscapeString`.
				1340
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1341
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1342	Raw-Unicode-Escape Codecs
				1343	"""""""""""""""""""""""""
				1344
				1345	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1346
				1347
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1348	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
				1349	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1350
				1351	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1352	encoded string s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1353
				1354
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1355	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				1356
				1357	Encode a Unicode object using Raw-Unicode-Escape and return the result as
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1358	a bytes object. Error handling is "strict". Return ``NULL`` if an exception
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1359	was raised by the codec.
				1360
				1361
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1362	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
Serhiy Storchaka	57dd79e	2018-12-19 15:31:40 +0200	[diff] [blame]	1363	Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1364
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1365	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1366	and return a bytes object. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1367
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1368	.. deprecated-removed:: 3.3 4.0
				1369	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1370	:c:func:`PyUnicode_AsRawUnicodeEscapeString` or
				1371	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1372
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1373
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1374	Latin-1 Codecs
				1375	""""""""""""""
				1376
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1377	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				1378	ordinals and only these are accepted by the codecs during encoding.
				1379
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1380
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1381	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1382
				1383	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1384	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1385
				1386
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1387	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				1388
				1389	Encode a Unicode object using Latin-1 and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1390	object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1391	raised by the codec.
				1392
				1393
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1394	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1395
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1396	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1397	return a Python bytes object. Return ``NULL`` if an exception was raised by
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1398	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1399
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1400	.. deprecated-removed:: 3.3 4.0
				1401	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1402	:c:func:`PyUnicode_AsLatin1String` or
				1403	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1404
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1405
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1406	ASCII Codecs
				1407	""""""""""""
				1408
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1409	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				1410	codes generate errors.
				1411
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1412
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1413	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1414
				1415	Create a Unicode object by decoding size bytes of the ASCII encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1416	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1417
				1418
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1419	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				1420
				1421	Encode a Unicode object using ASCII and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1422	object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1423	raised by the codec.
				1424
				1425
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1426	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1427
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1428	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1429	return a Python bytes object. Return ``NULL`` if an exception was raised by
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1430	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1431
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1432	.. deprecated-removed:: 3.3 4.0
				1433	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1434	:c:func:`PyUnicode_AsASCIIString` or
				1435	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1436
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1437
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1438	Character Map Codecs
				1439	""""""""""""""""""""
				1440
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1441	This codec is special in that it can be used to implement many different codecs
				1442	(and this is in fact what was done to obtain most of the standard codecs
				1443	included in the :mod:`encodings` package). The codec uses mapping to encode and
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1444	decode characters. The mapping objects provided must support the
				1445	:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1446
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1447	These are the mapping codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1448
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1449	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1450	PyObject mapping, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1451
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1452	Create a Unicode object by decoding size bytes of the encoded string s
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1453	using the given mapping object. Return ``NULL`` if an exception was raised
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1454	by the codec.
				1455
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1456	If mapping is ``NULL``, Latin-1 decoding will be applied. Else
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1457	mapping must map bytes ordinals (integers in the range from 0 to 255)
				1458	to Unicode strings, integers (which are then interpreted as Unicode
				1459	ordinals) or ``None``. Unmapped data bytes -- ones which cause a
				1460	:exc:`LookupError`, as well as ones which get mapped to ``None``,
				1461	``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
				1462	an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1463
				1464
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1465	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1466
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1467	Encode a Unicode object using the given mapping object and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1468	result as a bytes object. Error handling is "strict". Return ``NULL`` if an
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1469	exception was raised by the codec.
				1470
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1471	The mapping object must map Unicode ordinal integers to bytes objects,
				1472	integers in the range from 0 to 255 or ``None``. Unmapped character
				1473	ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
				1474	``None`` are treated as "undefined mapping" and cause an error.
Jeroen Ruigrok van der Werven	47a7d70	2009-04-27 05:43:17 +0000	[diff] [blame]	1475
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1476
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1477	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1478	PyObject mapping, const char errors)
				1479
				1480	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1481	mapping object and return the result as a bytes object. Return ``NULL`` if
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1482	an exception was raised by the codec.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1483
				1484	.. deprecated-removed:: 3.3 4.0
				1485	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1486	:c:func:`PyUnicode_AsCharmapString` or
				1487	:c:func:`PyUnicode_AsEncodedString`.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1488
				1489
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1490	The following codec API is special in that maps Unicode to Unicode.
				1491
Victor Stinner	46d10b1	2020-08-13 19:16:02 +0200	[diff] [blame]	1492	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1493
Victor Stinner	46d10b1	2020-08-13 19:16:02 +0200	[diff] [blame]	1494	Translate a string by applying a character mapping table to it and return the
				1495	resulting Unicode object. Return ``NULL`` if an exception was raised by the
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1496	codec.
				1497
Victor Stinner	46d10b1	2020-08-13 19:16:02 +0200	[diff] [blame]	1498	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1499	or ``None`` (causing deletion of the character).
				1500
				1501	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1502	and sequences work well. Unmapped character ordinals (ones which cause a
				1503	:exc:`LookupError`) are left untouched and are copied as-is.
				1504
				1505	errors has the usual meaning for codecs. It may be ``NULL`` which indicates to
				1506	use the default error handling.
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1507
				1508
				1509	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1510	PyObject mapping, const char errors)
				1511
				1512	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
				1513	character mapping table to it and return the resulting Unicode object.
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1514	Return ``NULL`` when an exception was raised by the codec.
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1515
				1516	.. deprecated-removed:: 3.3 4.0
				1517	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1518	:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
				1519	<codec-registry>`
				1520
				1521
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1522	MBCS codecs for Windows
				1523	"""""""""""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1524
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1525	These are the MBCS codec APIs. They are currently only available on Windows and
				1526	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				1527	DBCS) is a class of encodings, not just one. The target encoding is defined by
				1528	the user settings on the machine running the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1529
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1530	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1531
				1532	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1533	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1534
				1535
Serhiy Storchaka	57dd79e	2018-12-19 15:31:40 +0200	[diff] [blame]	1536	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, Py_ssize_t size, \
				1537	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1538
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1539	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				1540	consumed is not ``NULL``, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1541	trailing lead byte and the number of bytes that have been decoded will be stored
				1542	in consumed.
				1543
				1544
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1545	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				1546
				1547	Encode a Unicode object using MBCS and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1548	object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1549	raised by the codec.
				1550
				1551
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1552	.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject unicode, const char errors)
				1553
				1554	Encode the Unicode object using the specified code page and return a Python
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1555	bytes object. Return ``NULL`` if an exception was raised by the codec. Use
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1556	:c:data:`CP_ACP` code page to get the MBCS encoder.
				1557
				1558	.. versionadded:: 3.3
				1559
				1560
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1561	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1562
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1563	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1564	a Python bytes object. Return ``NULL`` if an exception was raised by the
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1565	codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1566
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1567	.. deprecated-removed:: 3.3 4.0
				1568	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1569	:c:func:`PyUnicode_AsMBCSString`, :c:func:`PyUnicode_EncodeCodePage` or
				1570	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1571
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1572
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1573	Methods & Slots
				1574	"""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1575
				1576
				1577	.. _unicodemethodsandslots:
				1578
				1579	Methods and Slot Functions
				1580	^^^^^^^^^^^^^^^^^^^^^^^^^^
				1581
				1582	The following APIs are capable of handling Unicode objects and strings on input
				1583	(we refer to them as strings in the descriptions) and return Unicode objects or
				1584	integers as appropriate.
				1585
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1586	They all return ``NULL`` or ``-1`` if an exception occurs.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1587
				1588
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1589	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1590
				1591	Concat two strings giving a new Unicode string.
				1592
				1593
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1594	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1595
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1596	Split a string giving a list of Unicode strings. If sep is ``NULL``, splitting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1597	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1598	separator. At most maxsplit splits will be done. If negative, no limit is
				1599	set. Separators are not included in the resulting list.
				1600
				1601
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1602	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1603
				1604	Split a Unicode string at line breaks, returning a list of Unicode strings.
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1605	CRLF is considered to be one line break. If keepend is ``0``, the Line break
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1606	characters are not included in the resulting strings.
				1607
				1608
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1609	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1610
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1611	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1612	Unicode string.
				1613
				1614
Victor Stinner	13d3aa5	2014-10-09 11:11:25 +0200	[diff] [blame]	1615	.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject str, PyObject substr, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1616	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1617
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1618	Return ``1`` if substr matches ``str[start:end]`` at the given tail end
				1619	(direction == ``-1`` means to do a prefix match, direction == ``1`` a suffix match),
				1620	``0`` otherwise. Return ``-1`` if an error occurred.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1621
				1622
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1623	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, \
				1624	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1625
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1626	Return the first position of substr in ``str[start:end]`` using the given
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1627	direction (direction == ``1`` means to do a forward search, direction == ``-1`` a
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1628	backward search). The return value is the index of the first match; a value of
				1629	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1630	occurred and an exception has been set.
				1631
				1632
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1633	.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
				1634	Py_ssize_t start, Py_ssize_t end, int direction)
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1635
				1636	Return the first position of the character ch in ``str[start:end]`` using
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1637	the given direction (direction == ``1`` means to do a forward search,
				1638	direction == ``-1`` a backward search). The return value is the index of the
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1639	first match; a value of ``-1`` indicates that no match was found, and ``-2``
				1640	indicates that an error occurred and an exception has been set.
				1641
Georg Brandl	ee12f44	2011-09-28 21:51:06 +0200	[diff] [blame]	1642	.. versionadded:: 3.3
				1643
Xiang Zhang	b211068	2016-12-20 22:52:33 +0800	[diff] [blame]	1644	.. versionchanged:: 3.7
				1645	start and end are now adjusted to behave like ``str[start:end]``.
				1646
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1647
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1648	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, \
				1649	Py_ssize_t start, Py_ssize_t end)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1650
				1651	Return the number of non-overlapping occurrences of substr in
				1652	``str[start:end]``. Return ``-1`` if an error occurred.
				1653
				1654
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1655	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, \
				1656	PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1657
				1658	Replace at most maxcount occurrences of substr in str with replstr and
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1659	return the resulting Unicode object. maxcount == ``-1`` means replace all
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1660	occurrences.
				1661
				1662
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1663	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1664
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1665	Compare two strings and return ``-1``, ``0``, ``1`` for less than, equal, and greater than,
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1666	respectively.
				1667
Serhiy Storchaka	f4934ea	2016-11-16 10:17:58 +0200	[diff] [blame]	1668	This function returns ``-1`` upon failure, so one should call
				1669	:c:func:`PyErr_Occurred` to check for errors.
				1670
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1671
Serhiy Storchaka	03863d2	2015-06-21 17:11:21 +0300	[diff] [blame]	1672	.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject uni, const char string)
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1673
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1674	Compare a Unicode object, uni, with string and return ``-1``, ``0``, ``1`` for less
Victor Stinner	80e788a	2010-12-28 23:39:51 +0000	[diff] [blame]	1675	than, equal, and greater than, respectively. It is best to pass only
				1676	ASCII-encoded strings, but the function interprets the input string as
Zachary Ware	780b585	2014-06-06 09:13:18 -0500	[diff] [blame]	1677	ISO-8859-1 if it contains non-ASCII characters.
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1678
Serhiy Storchaka	419967b	2016-12-06 00:13:34 +0200	[diff] [blame]	1679	This function does not raise exceptions.
Serhiy Storchaka	f4934ea	2016-11-16 10:17:58 +0200	[diff] [blame]	1680
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1681
Eli Bendersky	0813168	2012-06-03 08:07:47 +0300	[diff] [blame]	1682	.. c:function:: PyObject* PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1683
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1684	Rich compare two Unicode strings and return one of the following:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1685
				1686	* ``NULL`` in case an exception was raised
				1687	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1688	* :const:`Py_NotImplemented` in case the type combination is unknown
				1689
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1690	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1691	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1692
				1693
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1694	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1695
				1696	Return a new string object from format and args; this is analogous to
Benjamin Peterson	102488b	2014-07-19 16:34:33 -0700	[diff] [blame]	1697	``format % args``.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1698
				1699
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1700	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1701
				1702	Check whether element is contained in container and return true or false
				1703	accordingly.
				1704
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1705	element has to coerce to a one element Unicode string. ``-1`` is returned
				1706	if there was an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1707
				1708
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1709	.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1710
				1711	Intern the argument \string* in place. The argument must be the address of a
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1712	pointer variable pointing to a Python Unicode string object. If there is an
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1713	existing interned string that is the same as \string, it sets \string to
				1714	it (decrementing the reference count of the old string object and incrementing
				1715	the reference count of the interned string object), otherwise it leaves
				1716	\string* alone and interns it (incrementing its reference count).
				1717	(Clarification: even though there is a lot of talk about reference counts, think
				1718	of this function as reference-count-neutral; you own the object after the call
				1719	if and only if you owned it before the call.)
				1720
				1721
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1722	.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1723
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1724	A combination of :c:func:`PyUnicode_FromString` and
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1725	:c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1726	object that has been interned, or a new ("owned") reference to an earlier
				1727	interned string object with the same value.