Blame - Doc/c-api/unicode.rst - platform/external/python/cpython3

blob: 96d77c4084132c17c1970b25fb612fa0f31d0e4f [file] [log] [blame]

Stéphane Wirtel	cbb6484	2019-05-17 11:55:34 +0200	[diff] [blame]	1	.. highlight:: c
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
Antoine Pitrou	fbd4f80	2012-08-11 16:51:50 +0200	[diff] [blame]	8	.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	9	.. sectionauthor:: Georg Brandl <georg@python.org>
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	10
				11	Unicode Objects
				12	^^^^^^^^^^^^^^^
				13
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	14	Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
				15	use a variety of representations, in order to allow handling the complete range
				16	of Unicode characters while staying memory efficient. There are special cases
				17	for strings where all code points are below 128, 256, or 65536; otherwise, code
				18	points must be below 1114112 (which is the full Unicode range).
				19
				20	:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	21	in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
				22	and inefficient; it should be avoided in performance- or memory-sensitive
				23	situations.
				24
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	25	Due to the transition between the old APIs and the new APIs, Unicode objects
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	26	can internally be in two states depending on how they were created:
				27
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	28	* "canonical" Unicode objects are all objects created by a non-deprecated
				29	Unicode API. They use the most efficient representation allowed by the
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	30	implementation.
				31
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	32	* "legacy" Unicode objects have been created through one of the deprecated
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	33	APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
				34	:c:type:`Py_UNICODE*` representation; you will have to call
				35	:c:func:`PyUnicode_READY` on them before calling any other API.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	36
				37
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	38	Unicode Type
				39	""""""""""""
				40
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	41	These are the basic Unicode object types used for the Unicode implementation in
				42	Python:
				43
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	44	.. c:type:: Py_UCS4
				45	Py_UCS2
				46	Py_UCS1
				47
				48	These types are typedefs for unsigned integer types wide enough to contain
				49	characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
				50	single Unicode characters, use :c:type:`Py_UCS4`.
				51
				52	.. versionadded:: 3.3
				53
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	54
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	55	.. c:type:: Py_UNICODE
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	56
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	57	This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
				58	depending on the platform.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	59
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	60	.. versionchanged:: 3.3
				61	In previous versions, this was a 16-bit type or a 32-bit type depending on
				62	whether you selected a "narrow" or "wide" Unicode version of Python at
				63	build time.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	64
				65
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	66	.. c:type:: PyASCIIObject
				67	PyCompactUnicodeObject
				68	PyUnicodeObject
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	69
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	70	These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
				71	almost all cases, they shouldn't be used directly, since all API functions
				72	that deal with Unicode objects take and return :c:type:`PyObject` pointers.
				73
				74	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	75
				76
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	77	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	78
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	79	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	80	is exposed to Python code as ``str``.
				81
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	82
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	83	The following APIs are really C macros and can be used to do fast checks and to
				84	access internal read-only data of Unicode objects:
				85
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	86	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	87
				88	Return true if the object o is a Unicode object or an instance of a Unicode
				89	subtype.
				90
				91
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	92	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	93
				94	Return true if the object o is a Unicode object, but not an instance of a
				95	subtype.
				96
				97
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	98	.. c:function:: int PyUnicode_READY(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	99
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	100	Ensure the string object o is in the "canonical" representation. This is
				101	required before using any of the access macros described below.
				102
				103	.. XXX expand on when it is not required
				104
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	105	Returns ``0`` on success and ``-1`` with an exception set on failure, which in
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	106	particular happens if memory allocation fails.
				107
				108	.. versionadded:: 3.3
				109
				110
				111	.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
				112
				113	Return the length of the Unicode string, in code points. o has to be a
				114	Unicode object in the "canonical" representation (not checked).
				115
				116	.. versionadded:: 3.3
				117
				118
				119	.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
				120	Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
				121	Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
				122
				123	Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
				124	integer types for direct character access. No checks are performed if the
				125	canonical representation has the correct character size; use
Martin v. Löwis	2da16e6	2011-10-07 20:58:00 +0200	[diff] [blame]	126	:c:func:`PyUnicode_KIND` to select the right macro. Make sure
Martin v. Löwis	c47adb0	2011-10-07 20:55:35 +0200	[diff] [blame]	127	:c:func:`PyUnicode_READY` has been called before accessing this.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	128
				129	.. versionadded:: 3.3
				130
				131
Victor Stinner	b4938aa	2011-11-20 18:27:28 +0100	[diff] [blame]	132	.. c:macro:: PyUnicode_WCHAR_KIND
				133	PyUnicode_1BYTE_KIND
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	134	PyUnicode_2BYTE_KIND
				135	PyUnicode_4BYTE_KIND
				136
				137	Return values of the :c:func:`PyUnicode_KIND` macro.
				138
				139	.. versionadded:: 3.3
				140
				141
				142	.. c:function:: int PyUnicode_KIND(PyObject *o)
				143
				144	Return one of the PyUnicode kind constants (see above) that indicate how many
				145	bytes per character this Unicode object uses to store its data. o has to
				146	be a Unicode object in the "canonical" representation (not checked).
				147
				148	.. XXX document "0" return value?
				149
				150	.. versionadded:: 3.3
				151
				152
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	153	.. c:function:: void* PyUnicode_DATA(PyObject *o)
				154
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	155	Return a void pointer to the raw Unicode buffer. o has to be a Unicode
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	156	object in the "canonical" representation (not checked).
				157
				158	.. versionadded:: 3.3
				159
				160
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	161	.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
				162	Py_UCS4 value)
				163
				164	Write into a canonical representation data (as obtained with
				165	:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
				166	intended for usage in loops. The caller should cache the kind value and
				167	data pointer as obtained from other macro calls. index is the index in
				168	the string (starts at 0) and value is the new code point value which should
				169	be written to that location.
				170
				171	.. versionadded:: 3.3
				172
				173
				174	.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
				175
				176	Read a code point from a canonical representation data (as obtained with
				177	:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
				178
				179	.. versionadded:: 3.3
				180
				181
				182	.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
				183
				184	Read a character from a Unicode object o, which must be in the "canonical"
				185	representation. This is less efficient than :c:func:`PyUnicode_READ` if you
				186	do multiple consecutive reads.
				187
				188	.. versionadded:: 3.3
				189
				190
				191	.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
				192
				193	Return the maximum code point that is suitable for creating another string
				194	based on o, which must be in the "canonical" representation. This is
				195	always an approximation but more efficient than iterating over the string.
				196
				197	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	198
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	199
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	200	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				201
				202	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				203	code units (this includes surrogate pairs as 2 units). o has to be a
				204	Unicode object (not checked).
				205
				206	.. deprecated-removed:: 3.3 4.0
				207	Part of the old-style Unicode API, please migrate to using
				208	:c:func:`PyUnicode_GET_LENGTH`.
				209
				210
				211	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				212
				213	Return the size of the deprecated :c:type:`Py_UNICODE` representation in
				214	bytes. o has to be a Unicode object (not checked).
				215
				216	.. deprecated-removed:: 3.3 4.0
				217	Part of the old-style Unicode API, please migrate to using
				218	:c:func:`PyUnicode_GET_LENGTH`.
				219
				220
				221	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				222	const char* PyUnicode_AS_DATA(PyObject *o)
				223
				224	Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	225	returned buffer is always terminated with an extra null code point. It
				226	may also contain embedded null code points, which would cause the string
				227	to be truncated when used in most C functions. The ``AS_DATA`` form
				228	casts the pointer to :c:type:`const char `. The o* argument has to be
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	229	a Unicode object (not checked).
				230
				231	.. versionchanged:: 3.3
				232	This macro is now inefficient -- because in many cases the
				233	:c:type:`Py_UNICODE` representation does not exist and needs to be created
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	234	-- and can fail (return ``NULL`` with an exception set). Try to port the
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	235	code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
				236	:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
				237
				238	.. deprecated-removed:: 3.3 4.0
				239	Part of the old-style Unicode API, please migrate to using the
				240	:c:func:`PyUnicode_nBYTE_DATA` family of macros.
				241
				242
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	243	Unicode Character Properties
				244	""""""""""""""""""""""""""""
				245
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	246	Unicode provides many different character properties. The most often needed ones
				247	are available through these macros which are mapped to C functions depending on
				248	the Python configuration.
				249
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	250
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	251	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	252
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	253	Return ``1`` or ``0`` depending on whether ch is a whitespace character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	254
				255
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	256	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	257
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	258	Return ``1`` or ``0`` depending on whether ch is a lowercase character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	259
				260
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	261	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	262
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	263	Return ``1`` or ``0`` depending on whether ch is an uppercase character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	264
				265
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	266	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	267
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	268	Return ``1`` or ``0`` depending on whether ch is a titlecase character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	269
				270
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	271	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	272
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	273	Return ``1`` or ``0`` depending on whether ch is a linebreak character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	274
				275
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	276	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	277
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	278	Return ``1`` or ``0`` depending on whether ch is a decimal character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	279
				280
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	281	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	282
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	283	Return ``1`` or ``0`` depending on whether ch is a digit character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	284
				285
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	286	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	287
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	288	Return ``1`` or ``0`` depending on whether ch is a numeric character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	289
				290
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	291	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	292
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	293	Return ``1`` or ``0`` depending on whether ch is an alphabetic character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	294
				295
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	296	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	297
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	298	Return ``1`` or ``0`` depending on whether ch is an alphanumeric character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	299
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	300
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	301	.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	302
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	303	Return ``1`` or ``0`` depending on whether ch is a printable character.
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	304	Nonprintable characters are those characters defined in the Unicode character
				305	database as "Other" or "Separator", excepting the ASCII space (0x20) which is
				306	considered printable. (Note that printable characters in this context are
				307	those which should not be escaped when :func:`repr` is invoked on a string.
				308	It has no bearing on the handling of strings written to :data:`sys.stdout` or
				309	:data:`sys.stderr`.)
				310
				311
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	312	These APIs can be used for fast direct character conversions:
				313
				314
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	315	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	316
				317	Return the character ch converted to lower case.
				318
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	319	.. deprecated:: 3.3
				320	This function uses simple case mappings.
				321
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	322
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	323	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	324
				325	Return the character ch converted to upper case.
				326
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	327	.. deprecated:: 3.3
				328	This function uses simple case mappings.
				329
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	330
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	331	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	332
				333	Return the character ch converted to title case.
				334
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	335	.. deprecated:: 3.3
				336	This function uses simple case mappings.
				337
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	338
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	339	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	340
				341	Return the character ch converted to a decimal positive integer. Return
				342	``-1`` if this is not possible. This macro does not raise exceptions.
				343
				344
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	345	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	346
				347	Return the character ch converted to a single digit integer. Return ``-1`` if
				348	this is not possible. This macro does not raise exceptions.
				349
				350
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	351	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	352
				353	Return the character ch converted to a double. Return ``-1.0`` if this is not
				354	possible. This macro does not raise exceptions.
				355
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	356
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	357	These APIs can be used to work with surrogates:
				358
				359	.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
				360
				361	Check if ch is a surrogate (``0xD800 <= ch <= 0xDFFF``).
				362
				363	.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
				364
Serhiy Storchaka	6a7b3a7	2016-04-17 08:32:47 +0300	[diff] [blame]	365	Check if ch is a high surrogate (``0xD800 <= ch <= 0xDBFF``).
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	366
				367	.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
				368
				369	Check if ch is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
				370
				371	.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
				372
				373	Join two surrogate characters and return a single Py_UCS4 value.
				374	high and low are respectively the leading and trailing surrogates in a
				375	surrogate pair.
				376
				377
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	378	Creating and accessing Unicode strings
				379	""""""""""""""""""""""""""""""""""""""
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	380
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	381	To create Unicode objects and access their basic sequence properties, use these
				382	APIs:
				383
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	384	.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	385
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	386	Create a new Unicode object. maxchar should be the true maximum code point
				387	to be placed in the string. As an approximation, it can be rounded up to the
				388	nearest value in the sequence 127, 255, 65535, 1114111.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	389
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	390	This is the recommended way to allocate a new Unicode object. Objects
				391	created using this function are not resizable.
				392
				393	.. versionadded:: 3.3
				394
				395
				396	.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
				397	Py_ssize_t size)
				398
				399	Create a new Unicode object with the given kind (possible values are
				400	:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
				401	:c:func:`PyUnicode_KIND`). The buffer must point to an array of size
				402	units of 1, 2 or 4 bytes per character, as given by the kind.
				403
				404	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	405
				406
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	407	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	408
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	409	Create a Unicode object from the char buffer u. The bytes will be
				410	interpreted as being UTF-8 encoded. The buffer is copied into the new
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	411	object. If the buffer is not ``NULL``, the return value might be a shared
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	412	object, i.e. modification of the data is not allowed.
				413
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	414	If u is ``NULL``, this function behaves like :c:func:`PyUnicode_FromUnicode`
				415	with the buffer set to ``NULL``. This usage is deprecated in favor of
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	416	:c:func:`PyUnicode_New`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	417
				418
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	419	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	420
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	421	Create a Unicode object from a UTF-8 encoded null-terminated char buffer
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	422	u.
				423
				424
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	425	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	426
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	427	Take a C :c:func:`printf`\ -style format string and a variable number of
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	428	arguments, calculate the size of the resulting Python Unicode string and return
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	429	a string with the values formatted into it. The variable arguments must be C
				430	types and must correspond exactly to the format characters in the format
Victor Stinner	1205f27	2010-09-11 00:54:47 +0000	[diff] [blame]	431	ASCII-encoded string. The following format characters are allowed:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	432
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	433	.. % This should be exactly the same as the table in PyErr_Format.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	434	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				435	.. % because not all compilers support the %z width modifier -- we fake it
				436	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	437	.. % Similar comments apply to the %ll width modifier and
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	438
Georg Brandl	44ea77b	2013-03-28 13:28:44 +0100	[diff] [blame]	439	.. tabularcolumns:: \|l\|l\|L\|
				440
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	441	+-------------------+---------------------+----------------------------------+
				442	\| Format Characters \| Type \| Comment \|
				443	+===================+=====================+==================================+
				444	\| :attr:`%%` \| n/a \| The literal % character. \|
				445	+-------------------+---------------------+----------------------------------+
				446	\| :attr:`%c` \| int \| A single character, \|
				447	\| \| \| represented as a C int. \|
				448	+-------------------+---------------------+----------------------------------+
				449	\| :attr:`%d` \| int \| Equivalent to \|
				450	\| \| \| ``printf("%d")``. [1]_ \|
				451	+-------------------+---------------------+----------------------------------+
				452	\| :attr:`%u` \| unsigned int \| Equivalent to \|
				453	\| \| \| ``printf("%u")``. [1]_ \|
				454	+-------------------+---------------------+----------------------------------+
				455	\| :attr:`%ld` \| long \| Equivalent to \|
				456	\| \| \| ``printf("%ld")``. [1]_ \|
				457	+-------------------+---------------------+----------------------------------+
				458	\| :attr:`%li` \| long \| Equivalent to \|
				459	\| \| \| ``printf("%li")``. [1]_ \|
				460	+-------------------+---------------------+----------------------------------+
				461	\| :attr:`%lu` \| unsigned long \| Equivalent to \|
				462	\| \| \| ``printf("%lu")``. [1]_ \|
				463	+-------------------+---------------------+----------------------------------+
				464	\| :attr:`%lld` \| long long \| Equivalent to \|
				465	\| \| \| ``printf("%lld")``. [1]_ \|
				466	+-------------------+---------------------+----------------------------------+
				467	\| :attr:`%lli` \| long long \| Equivalent to \|
				468	\| \| \| ``printf("%lli")``. [1]_ \|
				469	+-------------------+---------------------+----------------------------------+
				470	\| :attr:`%llu` \| unsigned long long \| Equivalent to \|
				471	\| \| \| ``printf("%llu")``. [1]_ \|
				472	+-------------------+---------------------+----------------------------------+
				473	\| :attr:`%zd` \| Py_ssize_t \| Equivalent to \|
				474	\| \| \| ``printf("%zd")``. [1]_ \|
				475	+-------------------+---------------------+----------------------------------+
				476	\| :attr:`%zi` \| Py_ssize_t \| Equivalent to \|
				477	\| \| \| ``printf("%zi")``. [1]_ \|
				478	+-------------------+---------------------+----------------------------------+
				479	\| :attr:`%zu` \| size_t \| Equivalent to \|
				480	\| \| \| ``printf("%zu")``. [1]_ \|
				481	+-------------------+---------------------+----------------------------------+
				482	\| :attr:`%i` \| int \| Equivalent to \|
				483	\| \| \| ``printf("%i")``. [1]_ \|
				484	+-------------------+---------------------+----------------------------------+
				485	\| :attr:`%x` \| int \| Equivalent to \|
				486	\| \| \| ``printf("%x")``. [1]_ \|
				487	+-------------------+---------------------+----------------------------------+
				488	\| :attr:`%s` \| const char\* \| A null-terminated C character \|
				489	\| \| \| array. \|
				490	+-------------------+---------------------+----------------------------------+
				491	\| :attr:`%p` \| const void\* \| The hex representation of a C \|
				492	\| \| \| pointer. Mostly equivalent to \|
				493	\| \| \| ``printf("%p")`` except that \|
				494	\| \| \| it is guaranteed to start with \|
				495	\| \| \| the literal ``0x`` regardless \|
				496	\| \| \| of what the platform's \|
				497	\| \| \| ``printf`` yields. \|
				498	+-------------------+---------------------+----------------------------------+
				499	\| :attr:`%A` \| PyObject\* \| The result of calling \|
				500	\| \| \| :func:`ascii`. \|
				501	+-------------------+---------------------+----------------------------------+
				502	\| :attr:`%U` \| PyObject\* \| A Unicode object. \|
				503	+-------------------+---------------------+----------------------------------+
				504	\| :attr:`%V` \| PyObject\*, \| A Unicode object (which may be \|
				505	\| \| const char\* \| ``NULL``) and a null-terminated \|
				506	\| \| \| C character array as a second \|
				507	\| \| \| parameter (which will be used, \|
				508	\| \| \| if the first parameter is \|
				509	\| \| \| ``NULL``). \|
				510	+-------------------+---------------------+----------------------------------+
				511	\| :attr:`%S` \| PyObject\* \| The result of calling \|
				512	\| \| \| :c:func:`PyObject_Str`. \|
				513	+-------------------+---------------------+----------------------------------+
				514	\| :attr:`%R` \| PyObject\* \| The result of calling \|
				515	\| \| \| :c:func:`PyObject_Repr`. \|
				516	+-------------------+---------------------+----------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	517
				518	An unrecognized format character causes all the rest of the format string to be
				519	copied as-is to the result string, and any extra arguments discarded.
				520
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	521	.. note::
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	522	The width formatter unit is number of characters rather than bytes.
				523	The precision formatter unit is number of bytes for ``"%s"`` and
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	524	``"%V"`` (if the ``PyObject*`` argument is ``NULL``), and a number of
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	525	characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	526	(if the ``PyObject*`` argument is not ``NULL``).
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	527
Louie Lu	88c38b3	2017-04-27 11:36:35 +0800	[diff] [blame]	528	.. [1] For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi,
				529	zu, i, x): the 0-conversion flag has effect even when a precision is given.
				530
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	531	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	532	Support for ``"%lld"`` and ``"%llu"`` added.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	533
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	534	.. versionchanged:: 3.3
				535	Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
				536
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	537	.. versionchanged:: 3.4
				538	Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
				539	``"%V"``, ``"%S"``, ``"%R"`` added.
				540
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	541
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	542	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	543
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	544	Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	545	arguments.
				546
Alexander Belopolsky	942af5a	2010-12-04 03:38:46 +0000	[diff] [blame]	547
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	548	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
				549	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	550
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	551	Decode an encoded object obj to a Unicode object.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	552
Serhiy Storchaka	b757c83	2014-12-05 22:25:22 +0200	[diff] [blame]	553	:class:`bytes`, :class:`bytearray` and other
				554	:term:`bytes-like objects <bytes-like object>`
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	555	are decoded according to the given encoding and using the error handling
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	556	defined by errors. Both can be ``NULL`` to have the interface use the default
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	557	values (see :ref:`builtincodecs` for details).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	558
				559	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				560	set.
				561
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	562	The API returns ``NULL`` if there was an error. The caller is responsible for
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	563	decref'ing the returned objects.
				564
				565
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	566	.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
				567
				568	Return the length of the Unicode object, in code points.
				569
				570	.. versionadded:: 3.3
				571
				572
Serhiy Storchaka	9c0e1f8	2016-10-08 22:45:38 +0300	[diff] [blame]	573	.. c:function:: Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, \
				574	Py_ssize_t to_start, \
				575	PyObject *from, \
				576	Py_ssize_t from_start, \
				577	Py_ssize_t how_many)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	578
				579	Copy characters from one Unicode object into another. This function performs
				580	character conversion when necessary and falls back to :c:func:`memcpy` if
				581	possible. Returns ``-1`` and sets an exception on error, otherwise returns
Serhiy Storchaka	9c0e1f8	2016-10-08 22:45:38 +0300	[diff] [blame]	582	the number of copied characters.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	583
				584	.. versionadded:: 3.3
				585
				586
Victor Stinner	606e19d	2012-01-04 03:59:16 +0100	[diff] [blame]	587	.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
Victor Stinner	3fe5531	2012-01-04 00:33:50 +0100	[diff] [blame]	588	Py_ssize_t length, Py_UCS4 fill_char)
				589
				590	Fill a string with a character: write fill_char into
				591	``unicode[start:start+length]``.
				592
				593	Fail if fill_char is bigger than the string maximum character, or if the
				594	string has more than 1 reference.
				595
				596	Return the number of written character, or return ``-1`` and raise an
				597	exception on error.
				598
				599	.. versionadded:: 3.3
				600
				601
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	602	.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
				603	Py_UCS4 character)
				604
				605	Write a character to a string. The string must have been created through
				606	:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
				607	the string must not be shared, or have been hashed yet.
				608
				609	This function checks that unicode is a Unicode object, that the index is
				610	not out of bounds, and that the object can be modified safely (i.e. that it
Berker Peksag	544ae59	2016-04-24 03:06:44 +0300	[diff] [blame]	611	its reference count is one).
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	612
				613	.. versionadded:: 3.3
				614
				615
				616	.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
				617
				618	Read a character from a string. This function checks that unicode is a
				619	Unicode object and the index is not out of bounds, in contrast to the macro
				620	version :c:func:`PyUnicode_READ_CHAR`.
				621
				622	.. versionadded:: 3.3
				623
				624
				625	.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
				626	Py_ssize_t end)
				627
				628	Return a substring of str, from character index start (included) to
				629	character index end (excluded). Negative indices are not supported.
				630
				631	.. versionadded:: 3.3
				632
				633
				634	.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject u, Py_UCS4 buffer, \
				635	Py_ssize_t buflen, int copy_null)
				636
				637	Copy the string u into a UCS4 buffer, including a null character, if
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	638	copy_null is set. Returns ``NULL`` and sets an exception on error (in
Serhiy Storchaka	cc16423	2016-10-02 21:29:26 +0300	[diff] [blame]	639	particular, a :exc:`SystemError` if buflen is smaller than the length of
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	640	u). buffer is returned on success.
				641
				642	.. versionadded:: 3.3
				643
				644
				645	.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
				646
				647	Copy the string u into a new UCS4 buffer that is allocated using
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	648	:c:func:`PyMem_Malloc`. If this fails, ``NULL`` is returned with a
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	649	:exc:`MemoryError` set. The returned buffer always has an extra
				650	null code point appended.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	651
				652	.. versionadded:: 3.3
				653
				654
				655	Deprecated Py_UNICODE APIs
				656	""""""""""""""""""""""""""
				657
				658	.. deprecated-removed:: 3.3 4.0
				659
				660	These API functions are deprecated with the implementation of :pep:`393`.
				661	Extension modules can continue using them, as they will not be removed in Python
				662	3.x, but need to be aware that their use can now cause performance and memory hits.
				663
				664
				665	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				666
				667	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	668	may be ``NULL`` which causes the contents to be undefined. It is the user's
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	669	responsibility to fill in the needed data. The buffer is copied into the new
				670	object.
				671
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	672	If the buffer is not ``NULL``, the return value might be a shared object.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	673	Therefore, modification of the resulting Unicode object is only allowed when
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	674	u is ``NULL``.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	675
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	676	If the buffer is ``NULL``, :c:func:`PyUnicode_READY` must be called once the
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	677	string content has been filled before using any of the access macros such as
				678	:c:func:`PyUnicode_KIND`.
				679
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	680	Please migrate to using :c:func:`PyUnicode_FromKindAndData`,
				681	:c:func:`PyUnicode_FromWideChar` or :c:func:`PyUnicode_New`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	682
				683
				684	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				685
				686	Return a read-only pointer to the Unicode object's internal
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	687	:c:type:`Py_UNICODE` buffer, or ``NULL`` on error. This will create the
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	688	:c:type:`Py_UNICODE*` representation of the object if it is not yet
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	689	available. The buffer is always terminated with an extra null code point.
				690	Note that the resulting :c:type:`Py_UNICODE` string may also contain
				691	embedded null code points, which would cause the string to be truncated when
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	692	used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	693
				694	Please migrate to using :c:func:`PyUnicode_AsUCS4`,
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	695	:c:func:`PyUnicode_AsWideChar`, :c:func:`PyUnicode_ReadChar` or similar new
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	696	APIs.
				697
				698
				699	.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
				700
				701	Create a Unicode object by replacing all decimal digits in
				702	:c:type:`Py_UNICODE` buffer of the given size by ASCII digits 0--9
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	703	according to their decimal value. Return ``NULL`` if an exception occurs.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	704
				705
				706	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject unicode, Py_ssize_t size)
				707
				708	Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	709	array length (excluding the extra null terminator) in size.
				710	Note that the resulting :c:type:`Py_UNICODE*` string
				711	may contain embedded null code points, which would cause the string to be
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	712	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	713
				714	.. versionadded:: 3.3
				715
				716
				717	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
				718
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	719	Create a copy of a Unicode string ending with a null code point. Return ``NULL``
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	720	and raise a :exc:`MemoryError` exception on memory allocation failure,
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	721	otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
				722	the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	723	contain embedded null code points, which would cause the string to be
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	724	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	725
				726	.. versionadded:: 3.2
				727
				728	Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
				729
				730
				731	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				732
				733	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				734	code units (this includes surrogate pairs as 2 units).
				735
				736	Please migrate to using :c:func:`PyUnicode_GetLength`.
				737
				738
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	739	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	740
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	741	Copy an instance of a Unicode subtype to a new true Unicode object if
				742	necessary. If obj is already a true Unicode object (not a subtype),
				743	return the reference with incremented refcount.
				744
				745	Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	746
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	747
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	748	Locale Encoding
				749	"""""""""""""""
				750
				751	The current locale encoding can be used to decode text from the operating
				752	system.
				753
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	754	.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
				755	Py_ssize_t len, \
				756	const char *errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	757
pxinwr	f4b0a1c	2019-03-04 17:02:06 +0800	[diff] [blame]	758	Decode a string from UTF-8 on Android and VxWorks, or from the current
				759	locale encoding on other platforms. The supported
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	760	error handlers are ``"strict"`` and ``"surrogateescape"``
				761	(:pep:`383`). The decoder uses ``"strict"`` error handler if
Andrew Svetlov	f4c3a18	2012-11-29 15:23:15 +0200	[diff] [blame]	762	errors is ``NULL``. str must end with a null character but
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	763	cannot contain embedded null characters.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	764
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	765	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
				766	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				767	Python startup).
				768
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	769	This function ignores the Python UTF-8 mode.
				770
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	771	.. seealso::
				772
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	773	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	774
				775	.. versionadded:: 3.3
				776
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	777	.. versionchanged:: 3.7
				778	The function now also uses the current locale encoding for the
Victor Stinner	9089a26	2018-01-22 19:07:32 +0100	[diff] [blame]	779	``surrogateescape`` error handler, except on Android. Previously, :c:func:`Py_DecodeLocale`
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	780	was used for the ``surrogateescape``, and the current locale encoding was
				781	used for ``strict``.
				782
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	783
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	784	.. c:function:: PyObject* PyUnicode_DecodeLocale(const char str, const char errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	785
				786	Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
				787	length using :c:func:`strlen`.
				788
				789	.. versionadded:: 3.3
				790
				791
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	792	.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject unicode, const char errors)
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	793
pxinwr	f4b0a1c	2019-03-04 17:02:06 +0800	[diff] [blame]	794	Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current
				795	locale encoding on other platforms. The
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	796	supported error handlers are ``"strict"`` and ``"surrogateescape"``
				797	(:pep:`383`). The encoder uses ``"strict"`` error handler if
Berker Peksag	90e0289	2016-10-17 00:45:56 +0300	[diff] [blame]	798	errors is ``NULL``. Return a :class:`bytes` object. unicode cannot
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	799	contain embedded null characters.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	800
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	801	Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
				802	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				803	Python startup).
				804
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	805	This function ignores the Python UTF-8 mode.
				806
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	807	.. seealso::
				808
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	809	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	810
				811	.. versionadded:: 3.3
				812
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	813	.. versionchanged:: 3.7
				814	The function now also uses the current locale encoding for the
Victor Stinner	9089a26	2018-01-22 19:07:32 +0100	[diff] [blame]	815	``surrogateescape`` error handler, except on Android. Previously,
				816	:c:func:`Py_EncodeLocale`
Victor Stinner	7ed7aea	2018-01-15 10:45:49 +0100	[diff] [blame]	817	was used for the ``surrogateescape``, and the current locale encoding was
				818	used for ``strict``.
				819
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	820
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	821	File System Encoding
				822	""""""""""""""""""""
				823
				824	To encode and decode file names and other environment strings,
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	825	:c:data:`Py_FileSystemDefaultEncoding` should be used as the encoding, and
				826	:c:data:`Py_FileSystemDefaultEncodeErrors` should be used as the error handler
				827	(:pep:`383` and :pep:`529`). To encode file names to :class:`bytes` during
				828	argument parsing, the ``"O&"`` converter should be used, passing
				829	:c:func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	830
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	831	.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	832
Brett Cannon	ec6ce87	2016-09-06 15:50:29 -0700	[diff] [blame]	833	ParseTuple converter: encode :class:`str` objects -- obtained directly or
				834	through the :class:`os.PathLike` interface -- to :class:`bytes` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	835	:c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
				836	result must be a :c:type:`PyBytesObject*` which must be released when it is
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	837	no longer used.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	838
				839	.. versionadded:: 3.1
				840
Brett Cannon	ec6ce87	2016-09-06 15:50:29 -0700	[diff] [blame]	841	.. versionchanged:: 3.6
				842	Accepts a :term:`path-like object`.
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	843
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	844	To decode file names to :class:`str` during argument parsing, the ``"O&"``
				845	converter should be used, passing :c:func:`PyUnicode_FSDecoder` as the
				846	conversion function:
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	847
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	848	.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	849
Brett Cannon	a571120	2016-09-06 19:36:01 -0700	[diff] [blame]	850	ParseTuple converter: decode :class:`bytes` objects -- obtained either
				851	directly or indirectly through the :class:`os.PathLike` interface -- to
				852	:class:`str` using :c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str`
				853	objects are output as-is. result must be a :c:type:`PyUnicodeObject*` which
				854	must be released when it is no longer used.
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	855
				856	.. versionadded:: 3.2
				857
Brett Cannon	a571120	2016-09-06 19:36:01 -0700	[diff] [blame]	858	.. versionchanged:: 3.6
				859	Accepts a :term:`path-like object`.
				860
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	861
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	862	.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	863
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	864	Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	865	:c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	866
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	867	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				868	locale encoding.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	869
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	870	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				871	locale encoding and cannot be modified later. If you need to decode a string
				872	from the current locale encoding, use
				873	:c:func:`PyUnicode_DecodeLocaleAndSize`.
				874
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	875	.. seealso::
				876
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	877	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	878
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	879	.. versionchanged:: 3.6
				880	Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	881
				882
				883	.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
				884
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	885	Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	886	and the :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	887
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	888	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				889	locale encoding.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	890
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	891	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	892
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	893	.. versionchanged:: 3.6
				894	Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	895
				896
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	897	.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	898
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	899	Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	900	:c:data:`Py_FileSystemDefaultEncodeErrors` error handler, and return
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	901	:class:`bytes`. Note that the resulting :class:`bytes` object may contain
				902	null bytes.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	903
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	904	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				905	locale encoding.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	906
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	907	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				908	locale encoding and cannot be modified later. If you need to encode a string
				909	to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
				910
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	911	.. seealso::
				912
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	913	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	914
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	915	.. versionadded:: 3.2
				916
Steve Dower	cc16be8	2016-09-08 10:35:16 -0700	[diff] [blame]	917	.. versionchanged:: 3.6
				918	Use :c:data:`Py_FileSystemDefaultEncodeErrors` error handler.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	919
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	920	wchar_t Support
				921	"""""""""""""""
				922
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	923	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	924
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	925	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	926
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	927	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	928	Passing ``-1`` as the size indicates that the function must itself compute the length,
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	929	using wcslen.
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	930	Return ``NULL`` on failure.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	931
				932
Serhiy Storchaka	57dd79e	2018-12-19 15:31:40 +0200	[diff] [blame]	933	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	934
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	935	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				936	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	937	null termination character). Return the number of :c:type:`wchar_t` characters
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	938	copied or ``-1`` in case of an error. Note that the resulting :c:type:`wchar_t*`
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	939	string may or may not be null-terminated. It is the responsibility of the caller
				940	to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	941	required by the application. Also, note that the :c:type:`wchar_t*` string
				942	might contain null characters, which would cause the string to be truncated
				943	when used with most C functions.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	944
				945
Victor Stinner	beb4135b	2010-10-07 01:02:42 +0000	[diff] [blame]	946	.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject unicode, Py_ssize_t size)
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	947
				948	Convert the Unicode object to a wide character string. The output string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	949	always ends with a null character. If size is not ``NULL``, write the number
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	950	of wide characters (excluding the trailing null termination character) into
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	951	\size*. Note that the resulting :c:type:`wchar_t` string might contain
				952	null characters, which would cause the string to be truncated when used with
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	953	most C functions. If size is ``NULL`` and the :c:type:`wchar_t*` string
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	954	contains null characters a :exc:`ValueError` is raised.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	955
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	956	Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	957	:c:func:`PyMem_Free` to free it) on success. On error, returns ``NULL``
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	958	and \size* is undefined. Raises a :exc:`MemoryError` if memory allocation
				959	is failed.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	960
				961	.. versionadded:: 3.2
				962
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	963	.. versionchanged:: 3.7
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	964	Raises a :exc:`ValueError` if size is ``NULL`` and the :c:type:`wchar_t*`
Serhiy Storchaka	e613e6a	2017-06-27 16:03:14 +0300	[diff] [blame]	965	string contains null characters.
				966
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	967
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	968	.. _builtincodecs:
				969
				970	Built-in Codecs
				971	^^^^^^^^^^^^^^^
				972
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	973	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	974	these codecs are directly usable via the following functions.
				975
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	976	Many of the following APIs take two arguments encoding and errors, and they
				977	have the same semantics as the ones of the built-in :func:`str` string object
				978	constructor.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	979
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	980	Setting encoding to ``NULL`` causes the default encoding to be used
Eric Wieser	bf15d5b	2020-02-10 23:32:18 +0000	[diff] [blame^]	981	which is UTF-8. The file system calls should use
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	982	:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
				983	variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	984	variable should be treated as read-only: on some systems, it will be a
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	985	pointer to a static string, on others, it will change at run-time
				986	(such as when the application invokes setlocale).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	987
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	988	Error handling is set by errors which may also be set to ``NULL`` meaning to use
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	989	the default handling defined for the codec. Default error handling for all
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	990	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	991
				992	The codecs all use a similar interface. Only deviation from the following
				993	generic ones are documented for simplicity.
				994
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	995
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	996	Generic Codecs
				997	""""""""""""""
				998
				999	These are the generic codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1000
				1001
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1002	.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
				1003	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1004
				1005	Create a Unicode object by decoding size bytes of the encoded string s.
				1006	encoding and errors have the same meaning as the parameters of the same name
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1007	in the :func:`str` built-in function. The codec to be used is looked up
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1008	using the Python codec registry. Return ``NULL`` if an exception was raised by
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1009	the codec.
				1010
				1011
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1012	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
				1013	const char encoding, const char errors)
				1014
				1015	Encode a Unicode object and return the result as Python bytes object.
				1016	encoding and errors have the same meaning as the parameters of the same
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1017	name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1018	using the Python codec registry. Return ``NULL`` if an exception was raised by
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1019	the codec.
				1020
				1021
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1022	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
				1023	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1024
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1025	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1026	bytes object. encoding and errors have the same meaning as the
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1027	parameters of the same name in the Unicode :meth:`~str.encode` method. The codec
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1028	to be used is looked up using the Python codec registry. Return ``NULL`` if an
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1029	exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1030
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1031	.. deprecated-removed:: 3.3 4.0
				1032	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1033	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1034
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1035
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1036	UTF-8 Codecs
				1037	""""""""""""
				1038
				1039	These are the UTF-8 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1040
				1041
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1042	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1043
				1044	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1045	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1046
				1047
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1048	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
				1049	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1050
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1051	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF8`. If
				1052	consumed is not ``NULL``, trailing incomplete UTF-8 byte sequences will not be
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1053	treated as an error. Those bytes will not be decoded and the number of bytes
				1054	that have been decoded will be stored in consumed.
				1055
				1056
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1057	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1058
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1059	Encode a Unicode object using UTF-8 and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1060	object. Error handling is "strict". Return ``NULL`` if an exception was
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1061	raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1062
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1063
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1064	.. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject unicode, Py_ssize_t size)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1065
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	1066	Return a pointer to the UTF-8 encoding of the Unicode object, and
				1067	store the size of the encoded representation (in bytes) in size. The
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1068	size argument can be ``NULL``; in this case no size will be stored. The
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	1069	returned buffer always has an extra null byte appended (not included in
				1070	size), regardless of whether there are any other null code points.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1071
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1072	In the case of an error, ``NULL`` is returned with an exception set and no
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1073	size is stored.
				1074
				1075	This caches the UTF-8 representation of the string in the Unicode object, and
				1076	subsequent calls will return a pointer to the same buffer. The caller is not
				1077	responsible for deallocating the buffer.
				1078
				1079	.. versionadded:: 3.3
				1080
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1081	.. versionchanged:: 3.7
				1082	The return type is now ``const char `` rather of ``char ``.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1083
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1084
				1085	.. c:function:: const char* PyUnicode_AsUTF8(PyObject *unicode)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1086
				1087	As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
				1088
				1089	.. versionadded:: 3.3
				1090
Serhiy Storchaka	2a404b6	2017-01-22 23:07:07 +0200	[diff] [blame]	1091	.. versionchanged:: 3.7
				1092	The return type is now ``const char `` rather of ``char ``.
				1093
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1094
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1095	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				1096
				1097	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1098	return a Python bytes object. Return ``NULL`` if an exception was raised by
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1099	the codec.
				1100
				1101	.. deprecated-removed:: 3.3 4.0
				1102	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1103	:c:func:`PyUnicode_AsUTF8String`, :c:func:`PyUnicode_AsUTF8AndSize` or
				1104	:c:func:`PyUnicode_AsEncodedString`.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1105
				1106
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1107	UTF-32 Codecs
				1108	"""""""""""""
				1109
				1110	These are the UTF-32 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1111
				1112
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1113	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
				1114	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1115
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1116	Decode size bytes from a UTF-32 encoded buffer string and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1117	corresponding Unicode object. errors (if non-``NULL``) defines the error
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1118	handling. It defaults to "strict".
				1119
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1120	If byteorder is non-``NULL``, the decoder starts decoding using the given byte
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1121	order::
				1122
				1123	*byteorder == -1: little endian
				1124	*byteorder == 0: native order
				1125	*byteorder == 1: big endian
				1126
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1127	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				1128	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1129	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1130	``1``, any byte order mark is copied to the output.
				1131
				1132	After completion, \byteorder* is set to the current byte order at the end
				1133	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1134
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1135	If byteorder is ``NULL``, the codec starts in native order mode.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1136
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1137	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1138
				1139
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1140	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
				1141	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1142
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1143	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				1144	consumed is not ``NULL``, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1145	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				1146	by four) as an error. Those bytes will not be decoded and the number of bytes
				1147	that have been decoded will be stored in consumed.
				1148
				1149
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1150	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				1151
				1152	Return a Python byte string using the UTF-32 encoding in native byte
				1153	order. The string always starts with a BOM mark. Error handling is "strict".
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1154	Return ``NULL`` if an exception was raised by the codec.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1155
				1156
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1157	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
				1158	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1159
				1160	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1161	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1162
				1163	byteorder == -1: little endian
				1164	byteorder == 0: native byte order (writes a BOM mark)
				1165	byteorder == 1: big endian
				1166
				1167	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1168	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1169
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	1170	If ``Py_UNICODE_WIDE`` is not defined, surrogate pairs will be output
Georg Brandl	3be472b	2015-01-14 08:26:30 +0100	[diff] [blame]	1171	as a single code point.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1172
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1173	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1174
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1175	.. deprecated-removed:: 3.3 4.0
				1176	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1177	:c:func:`PyUnicode_AsUTF32String` or :c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1178
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1179
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1180	UTF-16 Codecs
				1181	"""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1182
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1183	These are the UTF-16 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1184
				1185
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1186	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
				1187	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1188
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1189	Decode size bytes from a UTF-16 encoded buffer string and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1190	corresponding Unicode object. errors (if non-``NULL``) defines the error
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1191	handling. It defaults to "strict".
				1192
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1193	If byteorder is non-``NULL``, the decoder starts decoding using the given byte
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1194	order::
				1195
				1196	*byteorder == -1: little endian
				1197	*byteorder == 0: native order
				1198	*byteorder == 1: big endian
				1199
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1200	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				1201	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1202	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1203	``1``, any byte order mark is copied to the output (where it will result in
				1204	either a ``\ufeff`` or a ``\ufffe`` character).
				1205
				1206	After completion, \byteorder* is set to the current byte order at the end
				1207	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1208
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1209	If byteorder is ``NULL``, the codec starts in native order mode.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1210
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1211	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1212
				1213
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1214	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
				1215	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1216
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1217	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				1218	consumed is not ``NULL``, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1219	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				1220	split surrogate pair) as an error. Those bytes will not be decoded and the
				1221	number of bytes that have been decoded will be stored in consumed.
				1222
				1223
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1224	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				1225
				1226	Return a Python byte string using the UTF-16 encoding in native byte
				1227	order. The string always starts with a BOM mark. Error handling is "strict".
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1228	Return ``NULL`` if an exception was raised by the codec.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1229
				1230
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1231	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
				1232	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1233
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1234	Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1235	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1236
				1237	byteorder == -1: little endian
				1238	byteorder == 0: native byte order (writes a BOM mark)
				1239	byteorder == 1: big endian
				1240
				1241	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1242	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1243
Serhiy Storchaka	e835b31	2019-10-30 21:37:16 +0200	[diff] [blame]	1244	If ``Py_UNICODE_WIDE`` is defined, a single :c:type:`Py_UNICODE` value may get
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1245	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	1246	values is interpreted as a UCS-2 character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1247
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1248	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1249
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1250	.. deprecated-removed:: 3.3 4.0
				1251	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1252	:c:func:`PyUnicode_AsUTF16String` or :c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1253
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1254
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1255	UTF-7 Codecs
				1256	""""""""""""
				1257
				1258	These are the UTF-7 codec APIs:
				1259
				1260
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1261	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1262
				1263	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1264	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1265
				1266
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1267	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
				1268	const char errors, Py_ssize_t consumed)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1269
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1270	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeUTF7`. If
				1271	consumed is not ``NULL``, trailing incomplete UTF-7 base-64 sections will not
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1272	be treated as an error. Those bytes will not be decoded and the number of
				1273	bytes that have been decoded will be stored in consumed.
				1274
				1275
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1276	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
				1277	int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1278
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1279	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1280	return a Python bytes object. Return ``NULL`` if an exception was raised by
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1281	the codec.
				1282
				1283	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				1284	special meaning) will be encoded in base-64. If base64WhiteSpace is
				1285	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				1286	Python "utf-7" codec.
				1287
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1288	.. deprecated-removed:: 3.3 4.0
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1289	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1290	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1291
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1292
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1293	Unicode-Escape Codecs
				1294	"""""""""""""""""""""
				1295
				1296	These are the "Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1297
				1298
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1299	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
				1300	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1301
				1302	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1303	string s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1304
				1305
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1306	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				1307
Serhiy Storchaka	cf36835	2016-11-20 17:20:19 +0200	[diff] [blame]	1308	Encode a Unicode object using Unicode-Escape and return the result as a
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1309	bytes object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1310	raised by the codec.
				1311
				1312
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1313	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1314
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1315	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1316	return a bytes object. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1317
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1318	.. deprecated-removed:: 3.3 4.0
				1319	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1320	:c:func:`PyUnicode_AsUnicodeEscapeString`.
				1321
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1322
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1323	Raw-Unicode-Escape Codecs
				1324	"""""""""""""""""""""""""
				1325
				1326	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1327
				1328
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1329	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
				1330	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1331
				1332	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1333	encoded string s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1334
				1335
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1336	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				1337
				1338	Encode a Unicode object using Raw-Unicode-Escape and return the result as
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1339	a bytes object. Error handling is "strict". Return ``NULL`` if an exception
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1340	was raised by the codec.
				1341
				1342
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1343	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
Serhiy Storchaka	57dd79e	2018-12-19 15:31:40 +0200	[diff] [blame]	1344	Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1345
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1346	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1347	and return a bytes object. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1348
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1349	.. deprecated-removed:: 3.3 4.0
				1350	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1351	:c:func:`PyUnicode_AsRawUnicodeEscapeString` or
				1352	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1353
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1354
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1355	Latin-1 Codecs
				1356	""""""""""""""
				1357
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1358	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				1359	ordinals and only these are accepted by the codecs during encoding.
				1360
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1361
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1362	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1363
				1364	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1365	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1366
				1367
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1368	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				1369
				1370	Encode a Unicode object using Latin-1 and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1371	object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1372	raised by the codec.
				1373
				1374
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1375	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1376
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1377	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1378	return a Python bytes object. Return ``NULL`` if an exception was raised by
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1379	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1380
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1381	.. deprecated-removed:: 3.3 4.0
				1382	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1383	:c:func:`PyUnicode_AsLatin1String` or
				1384	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1385
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1386
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1387	ASCII Codecs
				1388	""""""""""""
				1389
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1390	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				1391	codes generate errors.
				1392
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1393
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1394	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1395
				1396	Create a Unicode object by decoding size bytes of the ASCII encoded string
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1397	s. Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1398
				1399
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1400	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				1401
				1402	Encode a Unicode object using ASCII and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1403	object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1404	raised by the codec.
				1405
				1406
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1407	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1408
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1409	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1410	return a Python bytes object. Return ``NULL`` if an exception was raised by
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1411	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1412
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1413	.. deprecated-removed:: 3.3 4.0
				1414	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1415	:c:func:`PyUnicode_AsASCIIString` or
				1416	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1417
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1418
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1419	Character Map Codecs
				1420	""""""""""""""""""""
				1421
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1422	This codec is special in that it can be used to implement many different codecs
				1423	(and this is in fact what was done to obtain most of the standard codecs
				1424	included in the :mod:`encodings` package). The codec uses mapping to encode and
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1425	decode characters. The mapping objects provided must support the
				1426	:meth:`__getitem__` mapping interface; dictionaries and sequences work well.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1427
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1428	These are the mapping codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1429
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1430	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1431	PyObject mapping, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1432
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1433	Create a Unicode object by decoding size bytes of the encoded string s
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1434	using the given mapping object. Return ``NULL`` if an exception was raised
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1435	by the codec.
				1436
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1437	If mapping is ``NULL``, Latin-1 decoding will be applied. Else
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1438	mapping must map bytes ordinals (integers in the range from 0 to 255)
				1439	to Unicode strings, integers (which are then interpreted as Unicode
				1440	ordinals) or ``None``. Unmapped data bytes -- ones which cause a
				1441	:exc:`LookupError`, as well as ones which get mapped to ``None``,
				1442	``0xFFFE`` or ``'\ufffe'``, are treated as undefined mappings and cause
				1443	an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1444
				1445
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1446	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1447
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1448	Encode a Unicode object using the given mapping object and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1449	result as a bytes object. Error handling is "strict". Return ``NULL`` if an
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1450	exception was raised by the codec.
				1451
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1452	The mapping object must map Unicode ordinal integers to bytes objects,
				1453	integers in the range from 0 to 255 or ``None``. Unmapped character
				1454	ordinals (ones which cause a :exc:`LookupError`) as well as mapped to
				1455	``None`` are treated as "undefined mapping" and cause an error.
Jeroen Ruigrok van der Werven	47a7d70	2009-04-27 05:43:17 +0000	[diff] [blame]	1456
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1457
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1458	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1459	PyObject mapping, const char errors)
				1460
				1461	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1462	mapping object and return the result as a bytes object. Return ``NULL`` if
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1463	an exception was raised by the codec.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1464
				1465	.. deprecated-removed:: 3.3 4.0
				1466	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1467	:c:func:`PyUnicode_AsCharmapString` or
				1468	:c:func:`PyUnicode_AsEncodedString`.
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1469
				1470
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1471	The following codec API is special in that maps Unicode to Unicode.
				1472
				1473	.. c:function:: PyObject* PyUnicode_Translate(PyObject *unicode, \
				1474	PyObject mapping, const char errors)
				1475
				1476	Translate a Unicode object using the given mapping object and return the
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1477	resulting Unicode object. Return ``NULL`` if an exception was raised by the
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1478	codec.
				1479
				1480	The mapping object must map Unicode ordinal integers to Unicode strings,
				1481	integers (which are then interpreted as Unicode ordinals) or ``None``
				1482	(causing deletion of the character). Unmapped character ordinals (ones
				1483	which cause a :exc:`LookupError`) are left untouched and are copied as-is.
				1484
				1485
				1486	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1487	PyObject mapping, const char errors)
				1488
				1489	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
				1490	character mapping table to it and return the resulting Unicode object.
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1491	Return ``NULL`` when an exception was raised by the codec.
Serhiy Storchaka	c85a266	2017-03-19 08:15:17 +0200	[diff] [blame]	1492
				1493	.. deprecated-removed:: 3.3 4.0
				1494	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1495	:c:func:`PyUnicode_Translate`. or :ref:`generic codec based API
				1496	<codec-registry>`
				1497
				1498
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1499	MBCS codecs for Windows
				1500	"""""""""""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1501
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1502	These are the MBCS codec APIs. They are currently only available on Windows and
				1503	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				1504	DBCS) is a class of encodings, not just one. The target encoding is defined by
				1505	the user settings on the machine running the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1506
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1507	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1508
				1509	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1510	Return ``NULL`` if an exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1511
				1512
Serhiy Storchaka	57dd79e	2018-12-19 15:31:40 +0200	[diff] [blame]	1513	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, Py_ssize_t size, \
				1514	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1515
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1516	If consumed is ``NULL``, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				1517	consumed is not ``NULL``, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1518	trailing lead byte and the number of bytes that have been decoded will be stored
				1519	in consumed.
				1520
				1521
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1522	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				1523
				1524	Encode a Unicode object using MBCS and return the result as Python bytes
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1525	object. Error handling is "strict". Return ``NULL`` if an exception was
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1526	raised by the codec.
				1527
				1528
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1529	.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject unicode, const char errors)
				1530
				1531	Encode the Unicode object using the specified code page and return a Python
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1532	bytes object. Return ``NULL`` if an exception was raised by the codec. Use
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1533	:c:data:`CP_ACP` code page to get the MBCS encoder.
				1534
				1535	.. versionadded:: 3.3
				1536
				1537
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1538	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1539
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1540	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1541	a Python bytes object. Return ``NULL`` if an exception was raised by the
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1542	codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1543
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1544	.. deprecated-removed:: 3.3 4.0
				1545	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Serhiy Storchaka	f675a37	2016-11-20 12:13:44 +0200	[diff] [blame]	1546	:c:func:`PyUnicode_AsMBCSString`, :c:func:`PyUnicode_EncodeCodePage` or
				1547	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1548
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1549
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1550	Methods & Slots
				1551	"""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1552
				1553
				1554	.. _unicodemethodsandslots:
				1555
				1556	Methods and Slot Functions
				1557	^^^^^^^^^^^^^^^^^^^^^^^^^^
				1558
				1559	The following APIs are capable of handling Unicode objects and strings on input
				1560	(we refer to them as strings in the descriptions) and return Unicode objects or
				1561	integers as appropriate.
				1562
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1563	They all return ``NULL`` or ``-1`` if an exception occurs.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1564
				1565
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1566	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1567
				1568	Concat two strings giving a new Unicode string.
				1569
				1570
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1571	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1572
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1573	Split a string giving a list of Unicode strings. If sep is ``NULL``, splitting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1574	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1575	separator. At most maxsplit splits will be done. If negative, no limit is
				1576	set. Separators are not included in the resulting list.
				1577
				1578
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1579	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1580
				1581	Split a Unicode string at line breaks, returning a list of Unicode strings.
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1582	CRLF is considered to be one line break. If keepend is ``0``, the Line break
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1583	characters are not included in the resulting strings.
				1584
				1585
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1586	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, \
				1587	const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1588
				1589	Translate a string by applying a character mapping table to it and return the
				1590	resulting Unicode object.
				1591
				1592	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
Serhiy Storchaka	ecf41da	2016-10-19 16:29:26 +0300	[diff] [blame]	1593	or ``None`` (causing deletion of the character).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1594
				1595	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1596	and sequences work well. Unmapped character ordinals (ones which cause a
				1597	:exc:`LookupError`) are left untouched and are copied as-is.
				1598
Serhiy Storchaka	25fc088	2019-10-30 12:03:20 +0200	[diff] [blame]	1599	errors has the usual meaning for codecs. It may be ``NULL`` which indicates to
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1600	use the default error handling.
				1601
				1602
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1603	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1604
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1605	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1606	Unicode string.
				1607
				1608
Victor Stinner	13d3aa5	2014-10-09 11:11:25 +0200	[diff] [blame]	1609	.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject str, PyObject substr, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1610	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1611
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1612	Return ``1`` if substr matches ``str[start:end]`` at the given tail end
				1613	(direction == ``-1`` means to do a prefix match, direction == ``1`` a suffix match),
				1614	``0`` otherwise. Return ``-1`` if an error occurred.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1615
				1616
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1617	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, \
				1618	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1619
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1620	Return the first position of substr in ``str[start:end]`` using the given
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1621	direction (direction == ``1`` means to do a forward search, direction == ``-1`` a
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1622	backward search). The return value is the index of the first match; a value of
				1623	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1624	occurred and an exception has been set.
				1625
				1626
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1627	.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
				1628	Py_ssize_t start, Py_ssize_t end, int direction)
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1629
				1630	Return the first position of the character ch in ``str[start:end]`` using
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1631	the given direction (direction == ``1`` means to do a forward search,
				1632	direction == ``-1`` a backward search). The return value is the index of the
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1633	first match; a value of ``-1`` indicates that no match was found, and ``-2``
				1634	indicates that an error occurred and an exception has been set.
				1635
Georg Brandl	ee12f44	2011-09-28 21:51:06 +0200	[diff] [blame]	1636	.. versionadded:: 3.3
				1637
Xiang Zhang	b211068	2016-12-20 22:52:33 +0800	[diff] [blame]	1638	.. versionchanged:: 3.7
				1639	start and end are now adjusted to behave like ``str[start:end]``.
				1640
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1641
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1642	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, \
				1643	Py_ssize_t start, Py_ssize_t end)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1644
				1645	Return the number of non-overlapping occurrences of substr in
				1646	``str[start:end]``. Return ``-1`` if an error occurred.
				1647
				1648
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1649	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, \
				1650	PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1651
				1652	Replace at most maxcount occurrences of substr in str with replstr and
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1653	return the resulting Unicode object. maxcount == ``-1`` means replace all
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1654	occurrences.
				1655
				1656
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1657	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1658
Serhiy Storchaka	1ecf7d2	2016-10-27 21:41:19 +0300	[diff] [blame]	1659	Compare two strings and return ``-1``, ``0``, ``1`` for less than, equal, and greater than,
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1660	respectively.
				1661
Serhiy Storchaka	f4934ea	2016-11-16 10:17:58 +0200	[diff] [blame]	1662	This function returns ``-1`` upon failure, so one should call
				1663	:c:func:`PyErr_Occurred` to check for errors.
				1664
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1665
Serhiy Storchaka	03863d2	2015-06-21 17:11:21 +0300	[diff] [blame]	1666	.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject uni, const char string)
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1667
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1668	Compare a Unicode object, uni, with string and return ``-1``, ``0``, ``1`` for less
Victor Stinner	80e788a	2010-12-28 23:39:51 +0000	[diff] [blame]	1669	than, equal, and greater than, respectively. It is best to pass only
				1670	ASCII-encoded strings, but the function interprets the input string as
Zachary Ware	780b585	2014-06-06 09:13:18 -0500	[diff] [blame]	1671	ISO-8859-1 if it contains non-ASCII characters.
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1672
Serhiy Storchaka	419967b	2016-12-06 00:13:34 +0200	[diff] [blame]	1673	This function does not raise exceptions.
Serhiy Storchaka	f4934ea	2016-11-16 10:17:58 +0200	[diff] [blame]	1674
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1675
Eli Bendersky	0813168	2012-06-03 08:07:47 +0300	[diff] [blame]	1676	.. c:function:: PyObject* PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1677
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1678	Rich compare two Unicode strings and return one of the following:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1679
				1680	* ``NULL`` in case an exception was raised
				1681	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1682	* :const:`Py_NotImplemented` in case the type combination is unknown
				1683
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1684	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1685	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1686
				1687
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1688	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1689
				1690	Return a new string object from format and args; this is analogous to
Benjamin Peterson	102488b	2014-07-19 16:34:33 -0700	[diff] [blame]	1691	``format % args``.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1692
				1693
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1694	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1695
				1696	Check whether element is contained in container and return true or false
				1697	accordingly.
				1698
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1699	element has to coerce to a one element Unicode string. ``-1`` is returned
				1700	if there was an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1701
				1702
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1703	.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1704
				1705	Intern the argument \string* in place. The argument must be the address of a
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1706	pointer variable pointing to a Python Unicode string object. If there is an
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1707	existing interned string that is the same as \string, it sets \string to
				1708	it (decrementing the reference count of the old string object and incrementing
				1709	the reference count of the interned string object), otherwise it leaves
				1710	\string* alone and interns it (incrementing its reference count).
				1711	(Clarification: even though there is a lot of talk about reference counts, think
				1712	of this function as reference-count-neutral; you own the object after the call
				1713	if and only if you owned it before the call.)
				1714
				1715
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1716	.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1717
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1718	A combination of :c:func:`PyUnicode_FromString` and
toonarmycaptain	85225b6	2019-05-08 11:02:34 -0500	[diff] [blame]	1719	:c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1720	object that has been interned, or a new ("owned") reference to an earlier
				1721	interned string object with the same value.