Blame - Doc/c-api/unicode.rst - platform/external/python/cpython3

blob: 5383e9787f228f511478b7ae8b9d7cb89644f3c8 [file] [log] [blame]

Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
Antoine Pitrou	fbd4f80	2012-08-11 16:51:50 +0200	[diff] [blame]	8	.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	9	.. sectionauthor:: Georg Brandl <georg@python.org>
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	10
				11	Unicode Objects
				12	^^^^^^^^^^^^^^^
				13
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	14	Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
				15	use a variety of representations, in order to allow handling the complete range
				16	of Unicode characters while staying memory efficient. There are special cases
				17	for strings where all code points are below 128, 256, or 65536; otherwise, code
				18	points must be below 1114112 (which is the full Unicode range).
				19
				20	:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	21	in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
				22	and inefficient; it should be avoided in performance- or memory-sensitive
				23	situations.
				24
				25	Due to the transition between the old APIs and the new APIs, unicode objects
				26	can internally be in two states depending on how they were created:
				27
				28	* "canonical" unicode objects are all objects created by a non-deprecated
				29	unicode API. They use the most efficient representation allowed by the
				30	implementation.
				31
				32	* "legacy" unicode objects have been created through one of the deprecated
				33	APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
				34	:c:type:`Py_UNICODE*` representation; you will have to call
				35	:c:func:`PyUnicode_READY` on them before calling any other API.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	36
				37
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	38	Unicode Type
				39	""""""""""""
				40
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	41	These are the basic Unicode object types used for the Unicode implementation in
				42	Python:
				43
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	44	.. c:type:: Py_UCS4
				45	Py_UCS2
				46	Py_UCS1
				47
				48	These types are typedefs for unsigned integer types wide enough to contain
				49	characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
				50	single Unicode characters, use :c:type:`Py_UCS4`.
				51
				52	.. versionadded:: 3.3
				53
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	54
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	55	.. c:type:: Py_UNICODE
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	56
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	57	This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
				58	depending on the platform.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	59
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	60	.. versionchanged:: 3.3
				61	In previous versions, this was a 16-bit type or a 32-bit type depending on
				62	whether you selected a "narrow" or "wide" Unicode version of Python at
				63	build time.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	64
				65
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	66	.. c:type:: PyASCIIObject
				67	PyCompactUnicodeObject
				68	PyUnicodeObject
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	69
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	70	These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
				71	almost all cases, they shouldn't be used directly, since all API functions
				72	that deal with Unicode objects take and return :c:type:`PyObject` pointers.
				73
				74	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	75
				76
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	77	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	78
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	79	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	80	is exposed to Python code as ``str``.
				81
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	82
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	83	The following APIs are really C macros and can be used to do fast checks and to
				84	access internal read-only data of Unicode objects:
				85
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	86	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	87
				88	Return true if the object o is a Unicode object or an instance of a Unicode
				89	subtype.
				90
				91
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	92	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	93
				94	Return true if the object o is a Unicode object, but not an instance of a
				95	subtype.
				96
				97
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	98	.. c:function:: int PyUnicode_READY(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	99
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	100	Ensure the string object o is in the "canonical" representation. This is
				101	required before using any of the access macros described below.
				102
				103	.. XXX expand on when it is not required
				104
				105	Returns 0 on success and -1 with an exception set on failure, which in
				106	particular happens if memory allocation fails.
				107
				108	.. versionadded:: 3.3
				109
				110
				111	.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
				112
				113	Return the length of the Unicode string, in code points. o has to be a
				114	Unicode object in the "canonical" representation (not checked).
				115
				116	.. versionadded:: 3.3
				117
				118
				119	.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
				120	Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
				121	Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
				122
				123	Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
				124	integer types for direct character access. No checks are performed if the
				125	canonical representation has the correct character size; use
Martin v. Löwis	2da16e6	2011-10-07 20:58:00 +0200	[diff] [blame]	126	:c:func:`PyUnicode_KIND` to select the right macro. Make sure
Martin v. Löwis	c47adb0	2011-10-07 20:55:35 +0200	[diff] [blame]	127	:c:func:`PyUnicode_READY` has been called before accessing this.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	128
				129	.. versionadded:: 3.3
				130
				131
Victor Stinner	b4938aa	2011-11-20 18:27:28 +0100	[diff] [blame]	132	.. c:macro:: PyUnicode_WCHAR_KIND
				133	PyUnicode_1BYTE_KIND
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	134	PyUnicode_2BYTE_KIND
				135	PyUnicode_4BYTE_KIND
				136
				137	Return values of the :c:func:`PyUnicode_KIND` macro.
				138
				139	.. versionadded:: 3.3
				140
				141
				142	.. c:function:: int PyUnicode_KIND(PyObject *o)
				143
				144	Return one of the PyUnicode kind constants (see above) that indicate how many
				145	bytes per character this Unicode object uses to store its data. o has to
				146	be a Unicode object in the "canonical" representation (not checked).
				147
				148	.. XXX document "0" return value?
				149
				150	.. versionadded:: 3.3
				151
				152
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	153	.. c:function:: void* PyUnicode_DATA(PyObject *o)
				154
				155	Return a void pointer to the raw unicode buffer. o has to be a Unicode
				156	object in the "canonical" representation (not checked).
				157
				158	.. versionadded:: 3.3
				159
				160
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	161	.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
				162	Py_UCS4 value)
				163
				164	Write into a canonical representation data (as obtained with
				165	:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
				166	intended for usage in loops. The caller should cache the kind value and
				167	data pointer as obtained from other macro calls. index is the index in
				168	the string (starts at 0) and value is the new code point value which should
				169	be written to that location.
				170
				171	.. versionadded:: 3.3
				172
				173
				174	.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
				175
				176	Read a code point from a canonical representation data (as obtained with
				177	:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
				178
				179	.. versionadded:: 3.3
				180
				181
				182	.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
				183
				184	Read a character from a Unicode object o, which must be in the "canonical"
				185	representation. This is less efficient than :c:func:`PyUnicode_READ` if you
				186	do multiple consecutive reads.
				187
				188	.. versionadded:: 3.3
				189
				190
				191	.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
				192
				193	Return the maximum code point that is suitable for creating another string
				194	based on o, which must be in the "canonical" representation. This is
				195	always an approximation but more efficient than iterating over the string.
				196
				197	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	198
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	199
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	200	.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	201
				202	Clear the free list. Return the total number of freed items.
				203
Alexandre Vassalotti	6d3dfc3	2009-07-29 19:54:39 +0000	[diff] [blame]	204
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	205	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				206
				207	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				208	code units (this includes surrogate pairs as 2 units). o has to be a
				209	Unicode object (not checked).
				210
				211	.. deprecated-removed:: 3.3 4.0
				212	Part of the old-style Unicode API, please migrate to using
				213	:c:func:`PyUnicode_GET_LENGTH`.
				214
				215
				216	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				217
				218	Return the size of the deprecated :c:type:`Py_UNICODE` representation in
				219	bytes. o has to be a Unicode object (not checked).
				220
				221	.. deprecated-removed:: 3.3 4.0
				222	Part of the old-style Unicode API, please migrate to using
				223	:c:func:`PyUnicode_GET_LENGTH`.
				224
				225
				226	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				227	const char* PyUnicode_AS_DATA(PyObject *o)
				228
				229	Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	230	returned buffer is always terminated with an extra null code point. It
				231	may also contain embedded null code points, which would cause the string
				232	to be truncated when used in most C functions. The ``AS_DATA`` form
				233	casts the pointer to :c:type:`const char `. The o* argument has to be
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	234	a Unicode object (not checked).
				235
				236	.. versionchanged:: 3.3
				237	This macro is now inefficient -- because in many cases the
				238	:c:type:`Py_UNICODE` representation does not exist and needs to be created
				239	-- and can fail (return NULL with an exception set). Try to port the
				240	code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
				241	:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
				242
				243	.. deprecated-removed:: 3.3 4.0
				244	Part of the old-style Unicode API, please migrate to using the
				245	:c:func:`PyUnicode_nBYTE_DATA` family of macros.
				246
				247
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	248	Unicode Character Properties
				249	""""""""""""""""""""""""""""
				250
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	251	Unicode provides many different character properties. The most often needed ones
				252	are available through these macros which are mapped to C functions depending on
				253	the Python configuration.
				254
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	255
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	256	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	257
				258	Return 1 or 0 depending on whether ch is a whitespace character.
				259
				260
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	261	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	262
				263	Return 1 or 0 depending on whether ch is a lowercase character.
				264
				265
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	266	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	267
				268	Return 1 or 0 depending on whether ch is an uppercase character.
				269
				270
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	271	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	272
				273	Return 1 or 0 depending on whether ch is a titlecase character.
				274
				275
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	276	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	277
				278	Return 1 or 0 depending on whether ch is a linebreak character.
				279
				280
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	281	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	282
				283	Return 1 or 0 depending on whether ch is a decimal character.
				284
				285
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	286	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	287
				288	Return 1 or 0 depending on whether ch is a digit character.
				289
				290
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	291	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	292
				293	Return 1 or 0 depending on whether ch is a numeric character.
				294
				295
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	296	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	297
				298	Return 1 or 0 depending on whether ch is an alphabetic character.
				299
				300
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	301	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	302
				303	Return 1 or 0 depending on whether ch is an alphanumeric character.
				304
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	305
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	306	.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	307
				308	Return 1 or 0 depending on whether ch is a printable character.
				309	Nonprintable characters are those characters defined in the Unicode character
				310	database as "Other" or "Separator", excepting the ASCII space (0x20) which is
				311	considered printable. (Note that printable characters in this context are
				312	those which should not be escaped when :func:`repr` is invoked on a string.
				313	It has no bearing on the handling of strings written to :data:`sys.stdout` or
				314	:data:`sys.stderr`.)
				315
				316
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	317	These APIs can be used for fast direct character conversions:
				318
				319
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	320	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	321
				322	Return the character ch converted to lower case.
				323
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	324	.. deprecated:: 3.3
				325	This function uses simple case mappings.
				326
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	327
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	328	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	329
				330	Return the character ch converted to upper case.
				331
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	332	.. deprecated:: 3.3
				333	This function uses simple case mappings.
				334
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	335
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	336	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	337
				338	Return the character ch converted to title case.
				339
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	340	.. deprecated:: 3.3
				341	This function uses simple case mappings.
				342
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	343
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	344	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	345
				346	Return the character ch converted to a decimal positive integer. Return
				347	``-1`` if this is not possible. This macro does not raise exceptions.
				348
				349
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	350	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	351
				352	Return the character ch converted to a single digit integer. Return ``-1`` if
				353	this is not possible. This macro does not raise exceptions.
				354
				355
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	356	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	357
				358	Return the character ch converted to a double. Return ``-1.0`` if this is not
				359	possible. This macro does not raise exceptions.
				360
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	361
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	362	These APIs can be used to work with surrogates:
				363
				364	.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
				365
				366	Check if ch is a surrogate (``0xD800 <= ch <= 0xDFFF``).
				367
				368	.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
				369
Serhiy Storchaka	6a7b3a7	2016-04-17 08:32:47 +0300	[diff] [blame]	370	Check if ch is a high surrogate (``0xD800 <= ch <= 0xDBFF``).
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	371
				372	.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
				373
				374	Check if ch is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
				375
				376	.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
				377
				378	Join two surrogate characters and return a single Py_UCS4 value.
				379	high and low are respectively the leading and trailing surrogates in a
				380	surrogate pair.
				381
				382
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	383	Creating and accessing Unicode strings
				384	""""""""""""""""""""""""""""""""""""""
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	385
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	386	To create Unicode objects and access their basic sequence properties, use these
				387	APIs:
				388
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	389	.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	390
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	391	Create a new Unicode object. maxchar should be the true maximum code point
				392	to be placed in the string. As an approximation, it can be rounded up to the
				393	nearest value in the sequence 127, 255, 65535, 1114111.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	394
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	395	This is the recommended way to allocate a new Unicode object. Objects
				396	created using this function are not resizable.
				397
				398	.. versionadded:: 3.3
				399
				400
				401	.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
				402	Py_ssize_t size)
				403
				404	Create a new Unicode object with the given kind (possible values are
				405	:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
				406	:c:func:`PyUnicode_KIND`). The buffer must point to an array of size
				407	units of 1, 2 or 4 bytes per character, as given by the kind.
				408
				409	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	410
				411
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	412	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	413
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	414	Create a Unicode object from the char buffer u. The bytes will be
				415	interpreted as being UTF-8 encoded. The buffer is copied into the new
				416	object. If the buffer is not NULL, the return value might be a shared
				417	object, i.e. modification of the data is not allowed.
				418
				419	If u is NULL, this function behaves like :c:func:`PyUnicode_FromUnicode`
				420	with the buffer set to NULL. This usage is deprecated in favor of
				421	:c:func:`PyUnicode_New`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	422
				423
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	424	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	425
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	426	Create a Unicode object from a UTF-8 encoded null-terminated char buffer
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	427	u.
				428
				429
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	430	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	431
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	432	Take a C :c:func:`printf`\ -style format string and a variable number of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	433	arguments, calculate the size of the resulting Python unicode string and return
				434	a string with the values formatted into it. The variable arguments must be C
				435	types and must correspond exactly to the format characters in the format
Victor Stinner	1205f27	2010-09-11 00:54:47 +0000	[diff] [blame]	436	ASCII-encoded string. The following format characters are allowed:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	437
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	438	.. % This should be exactly the same as the table in PyErr_Format.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	439	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				440	.. % because not all compilers support the %z width modifier -- we fake it
				441	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	442	.. % Similar comments apply to the %ll width modifier and
				443	.. % PY_FORMAT_LONG_LONG.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	444
Georg Brandl	44ea77b	2013-03-28 13:28:44 +0100	[diff] [blame]	445	.. tabularcolumns:: \|l\|l\|L\|
				446
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	447	+-------------------+---------------------+--------------------------------+
				448	\| Format Characters \| Type \| Comment \|
				449	+===================+=====================+================================+
				450	\| :attr:`%%` \| n/a \| The literal % character. \|
				451	+-------------------+---------------------+--------------------------------+
				452	\| :attr:`%c` \| int \| A single character, \|
Serhiy Storchaka	6a7b3a7	2016-04-17 08:32:47 +0300	[diff] [blame]	453	\| \| \| represented as a C int. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	454	+-------------------+---------------------+--------------------------------+
				455	\| :attr:`%d` \| int \| Exactly equivalent to \|
				456	\| \| \| ``printf("%d")``. \|
				457	+-------------------+---------------------+--------------------------------+
				458	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				459	\| \| \| ``printf("%u")``. \|
				460	+-------------------+---------------------+--------------------------------+
				461	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				462	\| \| \| ``printf("%ld")``. \|
				463	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	464	\| :attr:`%li` \| long \| Exactly equivalent to \|
				465	\| \| \| ``printf("%li")``. \|
				466	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	467	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				468	\| \| \| ``printf("%lu")``. \|
				469	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	470	\| :attr:`%lld` \| long long \| Exactly equivalent to \|
				471	\| \| \| ``printf("%lld")``. \|
				472	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	473	\| :attr:`%lli` \| long long \| Exactly equivalent to \|
				474	\| \| \| ``printf("%lli")``. \|
				475	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	476	\| :attr:`%llu` \| unsigned long long \| Exactly equivalent to \|
				477	\| \| \| ``printf("%llu")``. \|
				478	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	479	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				480	\| \| \| ``printf("%zd")``. \|
				481	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	482	\| :attr:`%zi` \| Py_ssize_t \| Exactly equivalent to \|
				483	\| \| \| ``printf("%zi")``. \|
				484	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	485	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				486	\| \| \| ``printf("%zu")``. \|
				487	+-------------------+---------------------+--------------------------------+
				488	\| :attr:`%i` \| int \| Exactly equivalent to \|
				489	\| \| \| ``printf("%i")``. \|
				490	+-------------------+---------------------+--------------------------------+
				491	\| :attr:`%x` \| int \| Exactly equivalent to \|
				492	\| \| \| ``printf("%x")``. \|
				493	+-------------------+---------------------+--------------------------------+
				494	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				495	\| \| \| array. \|
				496	+-------------------+---------------------+--------------------------------+
				497	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				498	\| \| \| pointer. Mostly equivalent to \|
				499	\| \| \| ``printf("%p")`` except that \|
				500	\| \| \| it is guaranteed to start with \|
				501	\| \| \| the literal ``0x`` regardless \|
				502	\| \| \| of what the platform's \|
				503	\| \| \| ``printf`` yields. \|
				504	+-------------------+---------------------+--------------------------------+
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	505	\| :attr:`%A` \| PyObject\* \| The result of calling \|
				506	\| \| \| :func:`ascii`. \|
				507	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	508	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				509	+-------------------+---------------------+--------------------------------+
				510	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				511	\| \| \| NULL) and a null-terminated \|
				512	\| \| \| C character array as a second \|
				513	\| \| \| parameter (which will be used, \|
				514	\| \| \| if the first parameter is \|
				515	\| \| \| NULL). \|
				516	+-------------------+---------------------+--------------------------------+
				517	\| :attr:`%S` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	518	\| \| \| :c:func:`PyObject_Str`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	519	+-------------------+---------------------+--------------------------------+
				520	\| :attr:`%R` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	521	\| \| \| :c:func:`PyObject_Repr`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	522	+-------------------+---------------------+--------------------------------+
				523
				524	An unrecognized format character causes all the rest of the format string to be
				525	copied as-is to the result string, and any extra arguments discarded.
				526
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	527	.. note::
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	528	The width formatter unit is number of characters rather than bytes.
				529	The precision formatter unit is number of bytes for ``"%s"`` and
				530	``"%V"`` (if the ``PyObject*`` argument is NULL), and a number of
				531	characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
				532	(if the ``PyObject*`` argument is not NULL).
				533
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	534	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	535	Support for ``"%lld"`` and ``"%llu"`` added.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	536
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	537	.. versionchanged:: 3.3
				538	Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
				539
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	540	.. versionchanged:: 3.4
				541	Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
				542	``"%V"``, ``"%S"``, ``"%R"`` added.
				543
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	544
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	545	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	546
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	547	Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	548	arguments.
				549
Alexander Belopolsky	942af5a	2010-12-04 03:38:46 +0000	[diff] [blame]	550
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	551	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
				552	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	553
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	554	Decode an encoded object obj to a Unicode object.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	555
Serhiy Storchaka	b757c83	2014-12-05 22:25:22 +0200	[diff] [blame]	556	:class:`bytes`, :class:`bytearray` and other
				557	:term:`bytes-like objects <bytes-like object>`
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	558	are decoded according to the given encoding and using the error handling
				559	defined by errors. Both can be NULL to have the interface use the default
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	560	values (see :ref:`builtincodecs` for details).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	561
				562	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				563	set.
				564
				565	The API returns NULL if there was an error. The caller is responsible for
				566	decref'ing the returned objects.
				567
				568
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	569	.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
				570
				571	Return the length of the Unicode object, in code points.
				572
				573	.. versionadded:: 3.3
				574
				575
				576	.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
Serhiy Storchaka	cdd0279	2013-08-08 16:47:43 +0300	[diff] [blame]	577	PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	578
				579	Copy characters from one Unicode object into another. This function performs
				580	character conversion when necessary and falls back to :c:func:`memcpy` if
				581	possible. Returns ``-1`` and sets an exception on error, otherwise returns
				582	``0``.
				583
				584	.. versionadded:: 3.3
				585
				586
Victor Stinner	606e19d	2012-01-04 03:59:16 +0100	[diff] [blame]	587	.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
Victor Stinner	3fe5531	2012-01-04 00:33:50 +0100	[diff] [blame]	588	Py_ssize_t length, Py_UCS4 fill_char)
				589
				590	Fill a string with a character: write fill_char into
				591	``unicode[start:start+length]``.
				592
				593	Fail if fill_char is bigger than the string maximum character, or if the
				594	string has more than 1 reference.
				595
				596	Return the number of written character, or return ``-1`` and raise an
				597	exception on error.
				598
				599	.. versionadded:: 3.3
				600
				601
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	602	.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
				603	Py_UCS4 character)
				604
				605	Write a character to a string. The string must have been created through
				606	:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
				607	the string must not be shared, or have been hashed yet.
				608
				609	This function checks that unicode is a Unicode object, that the index is
				610	not out of bounds, and that the object can be modified safely (i.e. that it
Berker Peksag	544ae59	2016-04-24 03:06:44 +0300	[diff] [blame]	611	its reference count is one).
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	612
				613	.. versionadded:: 3.3
				614
				615
				616	.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
				617
				618	Read a character from a string. This function checks that unicode is a
				619	Unicode object and the index is not out of bounds, in contrast to the macro
				620	version :c:func:`PyUnicode_READ_CHAR`.
				621
				622	.. versionadded:: 3.3
				623
				624
				625	.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
				626	Py_ssize_t end)
				627
				628	Return a substring of str, from character index start (included) to
				629	character index end (excluded). Negative indices are not supported.
				630
				631	.. versionadded:: 3.3
				632
				633
				634	.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject u, Py_UCS4 buffer, \
				635	Py_ssize_t buflen, int copy_null)
				636
				637	Copy the string u into a UCS4 buffer, including a null character, if
				638	copy_null is set. Returns NULL and sets an exception on error (in
				639	particular, a :exc:`ValueError` if buflen is smaller than the length of
				640	u). buffer is returned on success.
				641
				642	.. versionadded:: 3.3
				643
				644
				645	.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
				646
				647	Copy the string u into a new UCS4 buffer that is allocated using
				648	:c:func:`PyMem_Malloc`. If this fails, NULL is returned with a
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	649	:exc:`MemoryError` set. The returned buffer always has an extra
				650	null code point appended.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	651
				652	.. versionadded:: 3.3
				653
				654
				655	Deprecated Py_UNICODE APIs
				656	""""""""""""""""""""""""""
				657
				658	.. deprecated-removed:: 3.3 4.0
				659
				660	These API functions are deprecated with the implementation of :pep:`393`.
				661	Extension modules can continue using them, as they will not be removed in Python
				662	3.x, but need to be aware that their use can now cause performance and memory hits.
				663
				664
				665	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				666
				667	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
				668	may be NULL which causes the contents to be undefined. It is the user's
				669	responsibility to fill in the needed data. The buffer is copied into the new
				670	object.
				671
				672	If the buffer is not NULL, the return value might be a shared object.
				673	Therefore, modification of the resulting Unicode object is only allowed when
				674	u is NULL.
				675
				676	If the buffer is NULL, :c:func:`PyUnicode_READY` must be called once the
				677	string content has been filled before using any of the access macros such as
				678	:c:func:`PyUnicode_KIND`.
				679
				680	Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
				681	:c:func:`PyUnicode_New`.
				682
				683
				684	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				685
				686	Return a read-only pointer to the Unicode object's internal
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	687	:c:type:`Py_UNICODE` buffer, or NULL on error. This will create the
				688	:c:type:`Py_UNICODE*` representation of the object if it is not yet
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	689	available. The buffer is always terminated with an extra null code point.
				690	Note that the resulting :c:type:`Py_UNICODE` string may also contain
				691	embedded null code points, which would cause the string to be truncated when
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	692	used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	693
				694	Please migrate to using :c:func:`PyUnicode_AsUCS4`,
				695	:c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
				696	APIs.
				697
				698
				699	.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
				700
				701	Create a Unicode object by replacing all decimal digits in
				702	:c:type:`Py_UNICODE` buffer of the given size by ASCII digits 0--9
				703	according to their decimal value. Return NULL if an exception occurs.
				704
				705
				706	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject unicode, Py_ssize_t size)
				707
				708	Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	709	array length (excluding the extra null terminator) in size.
				710	Note that the resulting :c:type:`Py_UNICODE*` string
				711	may contain embedded null code points, which would cause the string to be
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	712	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	713
				714	.. versionadded:: 3.3
				715
				716
				717	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
				718
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	719	Create a copy of a Unicode string ending with a null code point. Return NULL
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	720	and raise a :exc:`MemoryError` exception on memory allocation failure,
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	721	otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
				722	the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	723	contain embedded null code points, which would cause the string to be
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	724	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	725
				726	.. versionadded:: 3.2
				727
				728	Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
				729
				730
				731	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				732
				733	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				734	code units (this includes surrogate pairs as 2 units).
				735
				736	Please migrate to using :c:func:`PyUnicode_GetLength`.
				737
				738
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	739	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	740
Martin Panter	20d3255	2016-04-15 00:56:21 +0000	[diff] [blame]	741	Copy an instance of a Unicode subtype to a new true Unicode object if
				742	necessary. If obj is already a true Unicode object (not a subtype),
				743	return the reference with incremented refcount.
				744
				745	Objects other than Unicode or its subtypes will cause a :exc:`TypeError`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	746
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	747
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	748	Locale Encoding
				749	"""""""""""""""
				750
				751	The current locale encoding can be used to decode text from the operating
				752	system.
				753
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	754	.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
				755	Py_ssize_t len, \
				756	const char *errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	757
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	758	Decode a string from the current locale encoding. The supported
				759	error handlers are ``"strict"`` and ``"surrogateescape"``
				760	(:pep:`383`). The decoder uses ``"strict"`` error handler if
Andrew Svetlov	f4c3a18	2012-11-29 15:23:15 +0200	[diff] [blame]	761	errors is ``NULL``. str must end with a null character but
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	762	cannot contain embedded null characters.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	763
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	764	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
				765	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				766	Python startup).
				767
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	768	.. seealso::
				769
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	770	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	771
				772	.. versionadded:: 3.3
				773
				774
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	775	.. c:function:: PyObject* PyUnicode_DecodeLocale(const char str, const char errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	776
				777	Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
				778	length using :c:func:`strlen`.
				779
				780	.. versionadded:: 3.3
				781
				782
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	783	.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject unicode, const char errors)
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	784
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	785	Encode a Unicode object to the current locale encoding. The
				786	supported error handlers are ``"strict"`` and ``"surrogateescape"``
				787	(:pep:`383`). The encoder uses ``"strict"`` error handler if
				788	errors is ``NULL``. Return a :class:`bytes` object. str cannot
				789	contain embedded null characters.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	790
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	791	Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
				792	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				793	Python startup).
				794
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	795	.. seealso::
				796
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	797	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	798
				799	.. versionadded:: 3.3
				800
				801
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	802	File System Encoding
				803	""""""""""""""""""""
				804
				805	To encode and decode file names and other environment strings,
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	806	:c:data:`Py_FileSystemEncoding` should be used as the encoding, and
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	807	``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
				808	encode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	809	used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	810
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	811	.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	812
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	813	ParseTuple converter: encode :class:`str` objects to :class:`bytes` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	814	:c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
				815	result must be a :c:type:`PyBytesObject*` which must be released when it is
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	816	no longer used.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	817
				818	.. versionadded:: 3.1
				819
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	820
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	821	To decode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	822	used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	823
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	824	.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	825
				826	ParseTuple converter: decode :class:`bytes` objects to :class:`str` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	827	:c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str` objects are output
				828	as-is. result must be a :c:type:`PyUnicodeObject*` which must be released
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	829	when it is no longer used.
				830
				831	.. versionadded:: 3.2
				832
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	833
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	834	.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	835
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	836	Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	837	``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	838
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	839	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				840	locale encoding.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	841
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	842	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				843	locale encoding and cannot be modified later. If you need to decode a string
				844	from the current locale encoding, use
				845	:c:func:`PyUnicode_DecodeLocaleAndSize`.
				846
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	847	.. seealso::
				848
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	849	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	850
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	851	.. versionchanged:: 3.2
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	852	Use ``"strict"`` error handler on Windows.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	853
				854
				855	.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
				856
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	857	Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	858	and the ``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	859
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	860	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				861	locale encoding.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	862
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	863	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	864
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	865	.. versionchanged:: 3.2
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	866	Use ``"strict"`` error handler on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	867
				868
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	869	.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	870
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	871	Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	872	``"surrogateescape"`` error handler, or ``"strict"`` on Windows, and return
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	873	:class:`bytes`. Note that the resulting :class:`bytes` object may contain
				874	null bytes.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	875
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	876	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				877	locale encoding.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	878
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	879	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				880	locale encoding and cannot be modified later. If you need to encode a string
				881	to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
				882
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	883	.. seealso::
				884
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	885	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	886
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	887	.. versionadded:: 3.2
				888
				889
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	890	wchar_t Support
				891	"""""""""""""""
				892
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	893	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	894
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	895	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	896
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	897	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	898	Passing -1 as the size indicates that the function must itself compute the length,
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	899	using wcslen.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	900	Return NULL on failure.
				901
				902
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	903	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	904
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	905	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				906	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	907	null termination character). Return the number of :c:type:`wchar_t` characters
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	908	copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	909	string may or may not be null-terminated. It is the responsibility of the caller
				910	to make sure that the :c:type:`wchar_t*` string is null-terminated in case this is
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	911	required by the application. Also, note that the :c:type:`wchar_t*` string
				912	might contain null characters, which would cause the string to be truncated
				913	when used with most C functions.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	914
				915
Victor Stinner	beb4135b	2010-10-07 01:02:42 +0000	[diff] [blame]	916	.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject unicode, Py_ssize_t size)
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	917
				918	Convert the Unicode object to a wide character string. The output string
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	919	always ends with a null character. If size is not NULL, write the number
				920	of wide characters (excluding the trailing null termination character) into
Victor Stinner	1c24bd0	2010-10-02 11:03:13 +0000	[diff] [blame]	921	\size*.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	922
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	923	Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
				924	:c:func:`PyMem_Free` to free it) on success. On error, returns NULL,
				925	\size* is undefined and raises a :exc:`MemoryError`. Note that the
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	926	resulting :c:type:`wchar_t` string might contain null characters, which
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	927	would cause the string to be truncated when used with most C functions.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	928
				929	.. versionadded:: 3.2
				930
				931
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	932	UCS4 Support
				933	""""""""""""
				934
				935	.. versionadded:: 3.3
				936
				937	.. XXX are these meant to be public?
				938
				939	.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
				940	Py_UCS4* Py_UCS4_strcpy(Py_UCS4 s1, const Py_UCS4 s2)
				941	Py_UCS4* Py_UCS4_strncpy(Py_UCS4 s1, const Py_UCS4 s2, size_t n)
				942	Py_UCS4* Py_UCS4_strcat(Py_UCS4 s1, const Py_UCS4 s2)
				943	int Py_UCS4_strcmp(const Py_UCS4 s1, const Py_UCS4 s2)
				944	int Py_UCS4_strncmp(const Py_UCS4 s1, const Py_UCS4 s2, size_t n)
Antoine Pitrou	57735a0	2011-10-22 22:08:46 +0200	[diff] [blame]	945	Py_UCS4* Py_UCS4_strchr(const Py_UCS4 *s, Py_UCS4 c)
				946	Py_UCS4* Py_UCS4_strrchr(const Py_UCS4 *s, Py_UCS4 c)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	947
				948	These utility functions work on strings of :c:type:`Py_UCS4` characters and
				949	otherwise behave like the C standard library functions with the same name.
				950
				951
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	952	.. _builtincodecs:
				953
				954	Built-in Codecs
				955	^^^^^^^^^^^^^^^
				956
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	957	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	958	these codecs are directly usable via the following functions.
				959
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	960	Many of the following APIs take two arguments encoding and errors, and they
				961	have the same semantics as the ones of the built-in :func:`str` string object
				962	constructor.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	963
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	964	Setting encoding to NULL causes the default encoding to be used
				965	which is ASCII. The file system calls should use
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	966	:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
				967	variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	968	variable should be treated as read-only: on some systems, it will be a
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	969	pointer to a static string, on others, it will change at run-time
				970	(such as when the application invokes setlocale).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	971
				972	Error handling is set by errors which may also be set to NULL meaning to use
				973	the default handling defined for the codec. Default error handling for all
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	974	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	975
				976	The codecs all use a similar interface. Only deviation from the following
				977	generic ones are documented for simplicity.
				978
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	979
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	980	Generic Codecs
				981	""""""""""""""
				982
				983	These are the generic codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	984
				985
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	986	.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
				987	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	988
				989	Create a Unicode object by decoding size bytes of the encoded string s.
				990	encoding and errors have the same meaning as the parameters of the same name
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	991	in the :func:`str` built-in function. The codec to be used is looked up
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	992	using the Python codec registry. Return NULL if an exception was raised by
				993	the codec.
				994
				995
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	996	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
				997	const char encoding, const char errors)
				998
				999	Encode a Unicode object and return the result as Python bytes object.
				1000	encoding and errors have the same meaning as the parameters of the same
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1001	name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1002	using the Python codec registry. Return NULL if an exception was raised by
				1003	the codec.
				1004
				1005
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1006	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
				1007	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1008
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1009	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1010	bytes object. encoding and errors have the same meaning as the
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1011	parameters of the same name in the Unicode :meth:`~str.encode` method. The codec
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1012	to be used is looked up using the Python codec registry. Return NULL if an
				1013	exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1014
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1015	.. deprecated-removed:: 3.3 4.0
				1016	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1017	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1018
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1019
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1020	UTF-8 Codecs
				1021	""""""""""""
				1022
				1023	These are the UTF-8 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1024
				1025
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1026	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1027
				1028	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				1029	s. Return NULL if an exception was raised by the codec.
				1030
				1031
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1032	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
				1033	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1034
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1035	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1036	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				1037	treated as an error. Those bytes will not be decoded and the number of bytes
				1038	that have been decoded will be stored in consumed.
				1039
				1040
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1041	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1042
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1043	Encode a Unicode object using UTF-8 and return the result as Python bytes
				1044	object. Error handling is "strict". Return NULL if an exception was
				1045	raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1046
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1047
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1048	.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject unicode, Py_ssize_t size)
				1049
R David Murray	0a560a1	2015-05-13 20:31:53 -0400	[diff] [blame]	1050	Return a pointer to the UTF-8 encoding of the Unicode object, and
				1051	store the size of the encoded representation (in bytes) in size. The
				1052	size argument can be NULL; in this case no size will be stored. The
				1053	returned buffer always has an extra null byte appended (not included in
				1054	size), regardless of whether there are any other null code points.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1055
				1056	In the case of an error, NULL is returned with an exception set and no
				1057	size is stored.
				1058
				1059	This caches the UTF-8 representation of the string in the Unicode object, and
				1060	subsequent calls will return a pointer to the same buffer. The caller is not
				1061	responsible for deallocating the buffer.
				1062
				1063	.. versionadded:: 3.3
				1064
				1065
				1066	.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
				1067
				1068	As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
				1069
				1070	.. versionadded:: 3.3
				1071
				1072
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1073	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				1074
				1075	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and
				1076	return a Python bytes object. Return NULL if an exception was raised by
				1077	the codec.
				1078
				1079	.. deprecated-removed:: 3.3 4.0
				1080	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1081	:c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
				1082
				1083
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1084	UTF-32 Codecs
				1085	"""""""""""""
				1086
				1087	These are the UTF-32 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1088
				1089
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1090	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
				1091	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1092
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1093	Decode size bytes from a UTF-32 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1094	corresponding Unicode object. errors (if non-NULL) defines the error
				1095	handling. It defaults to "strict".
				1096
				1097	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1098	order::
				1099
				1100	*byteorder == -1: little endian
				1101	*byteorder == 0: native order
				1102	*byteorder == 1: big endian
				1103
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1104	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				1105	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1106	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1107	``1``, any byte order mark is copied to the output.
				1108
				1109	After completion, \byteorder* is set to the current byte order at the end
				1110	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1111
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1112	If byteorder is NULL, the codec starts in native order mode.
				1113
				1114	Return NULL if an exception was raised by the codec.
				1115
				1116
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1117	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
				1118	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1119
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1120	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				1121	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1122	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				1123	by four) as an error. Those bytes will not be decoded and the number of bytes
				1124	that have been decoded will be stored in consumed.
				1125
				1126
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1127	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				1128
				1129	Return a Python byte string using the UTF-32 encoding in native byte
				1130	order. The string always starts with a BOM mark. Error handling is "strict".
				1131	Return NULL if an exception was raised by the codec.
				1132
				1133
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1134	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
				1135	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1136
				1137	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1138	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1139
				1140	byteorder == -1: little endian
				1141	byteorder == 0: native byte order (writes a BOM mark)
				1142	byteorder == 1: big endian
				1143
				1144	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1145	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1146
				1147	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
Georg Brandl	3be472b	2015-01-14 08:26:30 +0100	[diff] [blame]	1148	as a single code point.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1149
				1150	Return NULL if an exception was raised by the codec.
				1151
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1152	.. deprecated-removed:: 3.3 4.0
				1153	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1154	:c:func:`PyUnicode_AsUTF32String`.
				1155
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1156
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1157	UTF-16 Codecs
				1158	"""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1159
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1160	These are the UTF-16 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1161
				1162
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1163	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
				1164	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1165
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1166	Decode size bytes from a UTF-16 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1167	corresponding Unicode object. errors (if non-NULL) defines the error
				1168	handling. It defaults to "strict".
				1169
				1170	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1171	order::
				1172
				1173	*byteorder == -1: little endian
				1174	*byteorder == 0: native order
				1175	*byteorder == 1: big endian
				1176
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1177	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				1178	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1179	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1180	``1``, any byte order mark is copied to the output (where it will result in
				1181	either a ``\ufeff`` or a ``\ufffe`` character).
				1182
				1183	After completion, \byteorder* is set to the current byte order at the end
				1184	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1185
				1186	If byteorder is NULL, the codec starts in native order mode.
				1187
				1188	Return NULL if an exception was raised by the codec.
				1189
				1190
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1191	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
				1192	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1193
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1194	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				1195	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1196	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				1197	split surrogate pair) as an error. Those bytes will not be decoded and the
				1198	number of bytes that have been decoded will be stored in consumed.
				1199
				1200
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1201	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				1202
				1203	Return a Python byte string using the UTF-16 encoding in native byte
				1204	order. The string always starts with a BOM mark. Error handling is "strict".
				1205	Return NULL if an exception was raised by the codec.
				1206
				1207
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1208	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
				1209	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1210
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1211	Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1212	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1213
				1214	byteorder == -1: little endian
				1215	byteorder == 0: native byte order (writes a BOM mark)
				1216	byteorder == 1: big endian
				1217
				1218	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1219	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1220
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1221	If Py_UNICODE_WIDE is defined, a single :c:type:`Py_UNICODE` value may get
				1222	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Martin Panter	6245cb3	2016-04-15 02:14:19 +0000	[diff] [blame]	1223	values is interpreted as a UCS-2 character.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1224
				1225	Return NULL if an exception was raised by the codec.
				1226
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1227	.. deprecated-removed:: 3.3 4.0
				1228	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1229	:c:func:`PyUnicode_AsUTF16String`.
				1230
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1231
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1232	UTF-7 Codecs
				1233	""""""""""""
				1234
				1235	These are the UTF-7 codec APIs:
				1236
				1237
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1238	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1239
				1240	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				1241	s. Return NULL if an exception was raised by the codec.
				1242
				1243
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1244	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
				1245	const char errors, Py_ssize_t consumed)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1246
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1247	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1248	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				1249	be treated as an error. Those bytes will not be decoded and the number of
				1250	bytes that have been decoded will be stored in consumed.
				1251
				1252
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1253	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
				1254	int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1255
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1256	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1257	return a Python bytes object. Return NULL if an exception was raised by
				1258	the codec.
				1259
				1260	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				1261	special meaning) will be encoded in base-64. If base64WhiteSpace is
				1262	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				1263	Python "utf-7" codec.
				1264
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1265	.. deprecated-removed:: 3.3 4.0
				1266	Part of the old-style :c:type:`Py_UNICODE` API.
				1267
				1268	.. XXX replace with what?
				1269
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1270
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1271	Unicode-Escape Codecs
				1272	"""""""""""""""""""""
				1273
				1274	These are the "Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1275
				1276
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1277	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
				1278	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1279
				1280	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				1281	string s. Return NULL if an exception was raised by the codec.
				1282
				1283
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1284	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				1285
				1286	Encode a Unicode object using Unicode-Escape and return the result as Python
				1287	string object. Error handling is "strict". Return NULL if an exception was
				1288	raised by the codec.
				1289
				1290
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1291	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1292
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1293	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1294	return a Python string object. Return NULL if an exception was raised by the
				1295	codec.
				1296
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1297	.. deprecated-removed:: 3.3 4.0
				1298	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1299	:c:func:`PyUnicode_AsUnicodeEscapeString`.
				1300
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1301
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1302	Raw-Unicode-Escape Codecs
				1303	"""""""""""""""""""""""""
				1304
				1305	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1306
				1307
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1308	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
				1309	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1310
				1311	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				1312	encoded string s. Return NULL if an exception was raised by the codec.
				1313
				1314
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1315	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				1316
				1317	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				1318	Python string object. Error handling is "strict". Return NULL if an exception
				1319	was raised by the codec.
				1320
				1321
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1322	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
				1323	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1324
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1325	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1326	and return a Python string object. Return NULL if an exception was raised by
				1327	the codec.
				1328
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1329	.. deprecated-removed:: 3.3 4.0
				1330	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1331	:c:func:`PyUnicode_AsRawUnicodeEscapeString`.
				1332
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1333
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1334	Latin-1 Codecs
				1335	""""""""""""""
				1336
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1337	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				1338	ordinals and only these are accepted by the codecs during encoding.
				1339
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1340
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1341	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1342
				1343	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				1344	s. Return NULL if an exception was raised by the codec.
				1345
				1346
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1347	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				1348
				1349	Encode a Unicode object using Latin-1 and return the result as Python bytes
				1350	object. Error handling is "strict". Return NULL if an exception was
				1351	raised by the codec.
				1352
				1353
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1354	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1355
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1356	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1357	return a Python bytes object. Return NULL if an exception was raised by
				1358	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1359
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1360	.. deprecated-removed:: 3.3 4.0
				1361	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1362	:c:func:`PyUnicode_AsLatin1String`.
				1363
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1364
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1365	ASCII Codecs
				1366	""""""""""""
				1367
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1368	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				1369	codes generate errors.
				1370
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1371
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1372	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1373
				1374	Create a Unicode object by decoding size bytes of the ASCII encoded string
				1375	s. Return NULL if an exception was raised by the codec.
				1376
				1377
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1378	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				1379
				1380	Encode a Unicode object using ASCII and return the result as Python bytes
				1381	object. Error handling is "strict". Return NULL if an exception was
				1382	raised by the codec.
				1383
				1384
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1385	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1386
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1387	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1388	return a Python bytes object. Return NULL if an exception was raised by
				1389	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1390
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1391	.. deprecated-removed:: 3.3 4.0
				1392	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1393	:c:func:`PyUnicode_AsASCIIString`.
				1394
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1395
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1396	Character Map Codecs
				1397	""""""""""""""""""""
				1398
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1399	This codec is special in that it can be used to implement many different codecs
				1400	(and this is in fact what was done to obtain most of the standard codecs
				1401	included in the :mod:`encodings` package). The codec uses mapping to encode and
				1402	decode characters.
				1403
				1404	Decoding mappings must map single string characters to single Unicode
				1405	characters, integers (which are then interpreted as Unicode ordinals) or None
				1406	(meaning "undefined mapping" and causing an error).
				1407
				1408	Encoding mappings must map single Unicode characters to single string
				1409	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				1410	(meaning "undefined mapping" and causing an error).
				1411
				1412	The mapping objects provided must only support the __getitem__ mapping
				1413	interface.
				1414
				1415	If a character lookup fails with a LookupError, the character is copied as-is
				1416	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				1417	resp. Because of this, mappings only need to contain those mappings which map
				1418	characters to different code points.
				1419
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1420	These are the mapping codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1421
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1422	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
				1423	PyObject mapping, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1424
				1425	Create a Unicode object by decoding size bytes of the encoded string s using
				1426	the given mapping object. Return NULL if an exception was raised by the
				1427	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				1428	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				1429	Byte values greater that the length of the string and U+FFFE "characters" are
				1430	treated as "undefined mapping".
				1431
				1432
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1433	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1434
				1435	Encode a Unicode object using the given mapping object and return the result
				1436	as Python string object. Error handling is "strict". Return NULL if an
				1437	exception was raised by the codec.
				1438
				1439	The following codec API is special in that maps Unicode to Unicode.
				1440
				1441
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1442	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1443	PyObject table, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1444
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1445	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1446	character mapping table to it and return the resulting Unicode object. Return
				1447	NULL when an exception was raised by the codec.
				1448
				1449	The mapping table must map Unicode ordinal integers to Unicode ordinal
				1450	integers or None (causing deletion of the character).
				1451
				1452	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1453	and sequences work well. Unmapped character ordinals (ones which cause a
				1454	:exc:`LookupError`) are left untouched and are copied as-is.
				1455
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1456	.. deprecated-removed:: 3.3 4.0
				1457	Part of the old-style :c:type:`Py_UNICODE` API.
				1458
				1459	.. XXX replace with what?
Jeroen Ruigrok van der Werven	47a7d70	2009-04-27 05:43:17 +0000	[diff] [blame]	1460
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1461
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1462	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1463	PyObject mapping, const char errors)
				1464
				1465	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
				1466	mapping object and return a Python string object. Return NULL if an
				1467	exception was raised by the codec.
				1468
				1469	.. deprecated-removed:: 3.3 4.0
				1470	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1471	:c:func:`PyUnicode_AsCharmapString`.
				1472
				1473
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1474	MBCS codecs for Windows
				1475	"""""""""""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1476
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1477	These are the MBCS codec APIs. They are currently only available on Windows and
				1478	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				1479	DBCS) is a class of encodings, not just one. The target encoding is defined by
				1480	the user settings on the machine running the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1481
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1482	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1483
				1484	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				1485	Return NULL if an exception was raised by the codec.
				1486
				1487
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1488	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
				1489	const char errors, int consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1490
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1491	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				1492	consumed is not NULL, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1493	trailing lead byte and the number of bytes that have been decoded will be stored
				1494	in consumed.
				1495
				1496
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1497	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				1498
				1499	Encode a Unicode object using MBCS and return the result as Python bytes
				1500	object. Error handling is "strict". Return NULL if an exception was
				1501	raised by the codec.
				1502
				1503
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1504	.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject unicode, const char errors)
				1505
				1506	Encode the Unicode object using the specified code page and return a Python
				1507	bytes object. Return NULL if an exception was raised by the codec. Use
				1508	:c:data:`CP_ACP` code page to get the MBCS encoder.
				1509
				1510	.. versionadded:: 3.3
				1511
				1512
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1513	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1514
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1515	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1516	a Python bytes object. Return NULL if an exception was raised by the
				1517	codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1518
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1519	.. deprecated-removed:: 3.3 4.0
				1520	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1521	:c:func:`PyUnicode_AsMBCSString` or :c:func:`PyUnicode_EncodeCodePage`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1522
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1523
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1524	Methods & Slots
				1525	"""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1526
				1527
				1528	.. _unicodemethodsandslots:
				1529
				1530	Methods and Slot Functions
				1531	^^^^^^^^^^^^^^^^^^^^^^^^^^
				1532
				1533	The following APIs are capable of handling Unicode objects and strings on input
				1534	(we refer to them as strings in the descriptions) and return Unicode objects or
				1535	integers as appropriate.
				1536
				1537	They all return NULL or ``-1`` if an exception occurs.
				1538
				1539
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1540	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1541
				1542	Concat two strings giving a new Unicode string.
				1543
				1544
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1545	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1546
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1547	Split a string giving a list of Unicode strings. If sep is NULL, splitting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1548	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1549	separator. At most maxsplit splits will be done. If negative, no limit is
				1550	set. Separators are not included in the resulting list.
				1551
				1552
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1553	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1554
				1555	Split a Unicode string at line breaks, returning a list of Unicode strings.
				1556	CRLF is considered to be one line break. If keepend is 0, the Line break
				1557	characters are not included in the resulting strings.
				1558
				1559
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1560	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, \
				1561	const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1562
				1563	Translate a string by applying a character mapping table to it and return the
				1564	resulting Unicode object.
				1565
				1566	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1567	or None (causing deletion of the character).
				1568
				1569	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1570	and sequences work well. Unmapped character ordinals (ones which cause a
				1571	:exc:`LookupError`) are left untouched and are copied as-is.
				1572
				1573	errors has the usual meaning for codecs. It may be NULL which indicates to
				1574	use the default error handling.
				1575
				1576
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1577	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1578
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1579	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1580	Unicode string.
				1581
				1582
Victor Stinner	13d3aa5	2014-10-09 11:11:25 +0200	[diff] [blame]	1583	.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject str, PyObject substr, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1584	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1585
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1586	Return 1 if substr matches ``str[start:end]`` at the given tail end
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1587	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				1588	0 otherwise. Return ``-1`` if an error occurred.
				1589
				1590
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1591	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, \
				1592	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1593
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1594	Return the first position of substr in ``str[start:end]`` using the given
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1595	direction (direction == 1 means to do a forward search, direction == -1 a
				1596	backward search). The return value is the index of the first match; a value of
				1597	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1598	occurred and an exception has been set.
				1599
				1600
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1601	.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
				1602	Py_ssize_t start, Py_ssize_t end, int direction)
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1603
				1604	Return the first position of the character ch in ``str[start:end]`` using
				1605	the given direction (direction == 1 means to do a forward search,
				1606	direction == -1 a backward search). The return value is the index of the
				1607	first match; a value of ``-1`` indicates that no match was found, and ``-2``
				1608	indicates that an error occurred and an exception has been set.
				1609
Georg Brandl	ee12f44	2011-09-28 21:51:06 +0200	[diff] [blame]	1610	.. versionadded:: 3.3
				1611
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1612
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1613	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, \
				1614	Py_ssize_t start, Py_ssize_t end)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1615
				1616	Return the number of non-overlapping occurrences of substr in
				1617	``str[start:end]``. Return ``-1`` if an error occurred.
				1618
				1619
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1620	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, \
				1621	PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1622
				1623	Replace at most maxcount occurrences of substr in str with replstr and
				1624	return the resulting Unicode object. maxcount == -1 means replace all
				1625	occurrences.
				1626
				1627
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1628	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1629
				1630	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				1631	respectively.
				1632
				1633
Serhiy Storchaka	03863d2	2015-06-21 17:11:21 +0300	[diff] [blame]	1634	.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject uni, const char string)
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1635
				1636	Compare a unicode object, uni, with string and return -1, 0, 1 for less
Victor Stinner	80e788a	2010-12-28 23:39:51 +0000	[diff] [blame]	1637	than, equal, and greater than, respectively. It is best to pass only
				1638	ASCII-encoded strings, but the function interprets the input string as
Zachary Ware	780b585	2014-06-06 09:13:18 -0500	[diff] [blame]	1639	ISO-8859-1 if it contains non-ASCII characters.
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1640
				1641
Eli Bendersky	0813168	2012-06-03 08:07:47 +0300	[diff] [blame]	1642	.. c:function:: PyObject* PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1643
				1644	Rich compare two unicode strings and return one of the following:
				1645
				1646	* ``NULL`` in case an exception was raised
				1647	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1648	* :const:`Py_NotImplemented` in case the type combination is unknown
				1649
				1650	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				1651	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				1652	with a :exc:`UnicodeDecodeError`.
				1653
				1654	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1655	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1656
				1657
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1658	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1659
				1660	Return a new string object from format and args; this is analogous to
Benjamin Peterson	102488b	2014-07-19 16:34:33 -0700	[diff] [blame]	1661	``format % args``.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1662
				1663
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1664	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1665
				1666	Check whether element is contained in container and return true or false
				1667	accordingly.
				1668
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1669	element has to coerce to a one element Unicode string. ``-1`` is returned
				1670	if there was an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1671
				1672
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1673	.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1674
				1675	Intern the argument \string* in place. The argument must be the address of a
				1676	pointer variable pointing to a Python unicode string object. If there is an
				1677	existing interned string that is the same as \string, it sets \string to
				1678	it (decrementing the reference count of the old string object and incrementing
				1679	the reference count of the interned string object), otherwise it leaves
				1680	\string* alone and interns it (incrementing its reference count).
				1681	(Clarification: even though there is a lot of talk about reference counts, think
				1682	of this function as reference-count-neutral; you own the object after the call
				1683	if and only if you owned it before the call.)
				1684
				1685
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1686	.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1687
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1688	A combination of :c:func:`PyUnicode_FromString` and
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1689	:c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
				1690	object that has been interned, or a new ("owned") reference to an earlier
				1691	interned string object with the same value.