Blame - Doc/c-api/unicode.rst - platform/external/python/cpython3

blob: 00063d0b5e98fe7e0fadc51b68ed615d3e232c1e [file] [log] [blame]

Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
Antoine Pitrou	fbd4f80	2012-08-11 16:51:50 +0200	[diff] [blame]	8	.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	9	.. sectionauthor:: Georg Brandl <georg@python.org>
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	10
				11	Unicode Objects
				12	^^^^^^^^^^^^^^^
				13
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	14	Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
				15	use a variety of representations, in order to allow handling the complete range
				16	of Unicode characters while staying memory efficient. There are special cases
				17	for strings where all code points are below 128, 256, or 65536; otherwise, code
				18	points must be below 1114112 (which is the full Unicode range).
				19
				20	:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	21	in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
				22	and inefficient; it should be avoided in performance- or memory-sensitive
				23	situations.
				24
				25	Due to the transition between the old APIs and the new APIs, unicode objects
				26	can internally be in two states depending on how they were created:
				27
				28	* "canonical" unicode objects are all objects created by a non-deprecated
				29	unicode API. They use the most efficient representation allowed by the
				30	implementation.
				31
				32	* "legacy" unicode objects have been created through one of the deprecated
				33	APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
				34	:c:type:`Py_UNICODE*` representation; you will have to call
				35	:c:func:`PyUnicode_READY` on them before calling any other API.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	36
				37
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	38	Unicode Type
				39	""""""""""""
				40
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	41	These are the basic Unicode object types used for the Unicode implementation in
				42	Python:
				43
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	44	.. c:type:: Py_UCS4
				45	Py_UCS2
				46	Py_UCS1
				47
				48	These types are typedefs for unsigned integer types wide enough to contain
				49	characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
				50	single Unicode characters, use :c:type:`Py_UCS4`.
				51
				52	.. versionadded:: 3.3
				53
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	54
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	55	.. c:type:: Py_UNICODE
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	56
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	57	This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
				58	depending on the platform.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	59
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	60	.. versionchanged:: 3.3
				61	In previous versions, this was a 16-bit type or a 32-bit type depending on
				62	whether you selected a "narrow" or "wide" Unicode version of Python at
				63	build time.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	64
				65
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	66	.. c:type:: PyASCIIObject
				67	PyCompactUnicodeObject
				68	PyUnicodeObject
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	69
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	70	These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
				71	almost all cases, they shouldn't be used directly, since all API functions
				72	that deal with Unicode objects take and return :c:type:`PyObject` pointers.
				73
				74	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	75
				76
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	77	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	78
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	79	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	80	is exposed to Python code as ``str``.
				81
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	82
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	83	The following APIs are really C macros and can be used to do fast checks and to
				84	access internal read-only data of Unicode objects:
				85
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	86	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	87
				88	Return true if the object o is a Unicode object or an instance of a Unicode
				89	subtype.
				90
				91
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	92	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	93
				94	Return true if the object o is a Unicode object, but not an instance of a
				95	subtype.
				96
				97
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	98	.. c:function:: int PyUnicode_READY(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	99
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	100	Ensure the string object o is in the "canonical" representation. This is
				101	required before using any of the access macros described below.
				102
				103	.. XXX expand on when it is not required
				104
				105	Returns 0 on success and -1 with an exception set on failure, which in
				106	particular happens if memory allocation fails.
				107
				108	.. versionadded:: 3.3
				109
				110
				111	.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
				112
				113	Return the length of the Unicode string, in code points. o has to be a
				114	Unicode object in the "canonical" representation (not checked).
				115
				116	.. versionadded:: 3.3
				117
				118
				119	.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
				120	Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
				121	Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
				122
				123	Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
				124	integer types for direct character access. No checks are performed if the
				125	canonical representation has the correct character size; use
Martin v. Löwis	2da16e6	2011-10-07 20:58:00 +0200	[diff] [blame]	126	:c:func:`PyUnicode_KIND` to select the right macro. Make sure
Martin v. Löwis	c47adb0	2011-10-07 20:55:35 +0200	[diff] [blame]	127	:c:func:`PyUnicode_READY` has been called before accessing this.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	128
				129	.. versionadded:: 3.3
				130
				131
Victor Stinner	b4938aa	2011-11-20 18:27:28 +0100	[diff] [blame]	132	.. c:macro:: PyUnicode_WCHAR_KIND
				133	PyUnicode_1BYTE_KIND
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	134	PyUnicode_2BYTE_KIND
				135	PyUnicode_4BYTE_KIND
				136
				137	Return values of the :c:func:`PyUnicode_KIND` macro.
				138
				139	.. versionadded:: 3.3
				140
				141
				142	.. c:function:: int PyUnicode_KIND(PyObject *o)
				143
				144	Return one of the PyUnicode kind constants (see above) that indicate how many
				145	bytes per character this Unicode object uses to store its data. o has to
				146	be a Unicode object in the "canonical" representation (not checked).
				147
				148	.. XXX document "0" return value?
				149
				150	.. versionadded:: 3.3
				151
				152
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	153	.. c:function:: void* PyUnicode_DATA(PyObject *o)
				154
				155	Return a void pointer to the raw unicode buffer. o has to be a Unicode
				156	object in the "canonical" representation (not checked).
				157
				158	.. versionadded:: 3.3
				159
				160
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	161	.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
				162	Py_UCS4 value)
				163
				164	Write into a canonical representation data (as obtained with
				165	:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
				166	intended for usage in loops. The caller should cache the kind value and
				167	data pointer as obtained from other macro calls. index is the index in
				168	the string (starts at 0) and value is the new code point value which should
				169	be written to that location.
				170
				171	.. versionadded:: 3.3
				172
				173
				174	.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
				175
				176	Read a code point from a canonical representation data (as obtained with
				177	:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
				178
				179	.. versionadded:: 3.3
				180
				181
				182	.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
				183
				184	Read a character from a Unicode object o, which must be in the "canonical"
				185	representation. This is less efficient than :c:func:`PyUnicode_READ` if you
				186	do multiple consecutive reads.
				187
				188	.. versionadded:: 3.3
				189
				190
				191	.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
				192
				193	Return the maximum code point that is suitable for creating another string
				194	based on o, which must be in the "canonical" representation. This is
				195	always an approximation but more efficient than iterating over the string.
				196
				197	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	198
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	199
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	200	.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	201
				202	Clear the free list. Return the total number of freed items.
				203
Alexandre Vassalotti	6d3dfc3	2009-07-29 19:54:39 +0000	[diff] [blame]	204
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	205	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				206
				207	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				208	code units (this includes surrogate pairs as 2 units). o has to be a
				209	Unicode object (not checked).
				210
				211	.. deprecated-removed:: 3.3 4.0
				212	Part of the old-style Unicode API, please migrate to using
				213	:c:func:`PyUnicode_GET_LENGTH`.
				214
				215
				216	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				217
				218	Return the size of the deprecated :c:type:`Py_UNICODE` representation in
				219	bytes. o has to be a Unicode object (not checked).
				220
				221	.. deprecated-removed:: 3.3 4.0
				222	Part of the old-style Unicode API, please migrate to using
				223	:c:func:`PyUnicode_GET_LENGTH`.
				224
				225
				226	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				227	const char* PyUnicode_AS_DATA(PyObject *o)
				228
				229	Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
				230	``AS_DATA`` form casts the pointer to :c:type:`const char `. o* has to be
				231	a Unicode object (not checked).
				232
				233	.. versionchanged:: 3.3
				234	This macro is now inefficient -- because in many cases the
				235	:c:type:`Py_UNICODE` representation does not exist and needs to be created
				236	-- and can fail (return NULL with an exception set). Try to port the
				237	code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
				238	:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
				239
				240	.. deprecated-removed:: 3.3 4.0
				241	Part of the old-style Unicode API, please migrate to using the
				242	:c:func:`PyUnicode_nBYTE_DATA` family of macros.
				243
				244
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	245	Unicode Character Properties
				246	""""""""""""""""""""""""""""
				247
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	248	Unicode provides many different character properties. The most often needed ones
				249	are available through these macros which are mapped to C functions depending on
				250	the Python configuration.
				251
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	252
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	253	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	254
				255	Return 1 or 0 depending on whether ch is a whitespace character.
				256
				257
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	258	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	259
				260	Return 1 or 0 depending on whether ch is a lowercase character.
				261
				262
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	263	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	264
				265	Return 1 or 0 depending on whether ch is an uppercase character.
				266
				267
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	268	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	269
				270	Return 1 or 0 depending on whether ch is a titlecase character.
				271
				272
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	273	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	274
				275	Return 1 or 0 depending on whether ch is a linebreak character.
				276
				277
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	278	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	279
				280	Return 1 or 0 depending on whether ch is a decimal character.
				281
				282
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	283	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	284
				285	Return 1 or 0 depending on whether ch is a digit character.
				286
				287
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	288	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	289
				290	Return 1 or 0 depending on whether ch is a numeric character.
				291
				292
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	293	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	294
				295	Return 1 or 0 depending on whether ch is an alphabetic character.
				296
				297
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	298	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	299
				300	Return 1 or 0 depending on whether ch is an alphanumeric character.
				301
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	302
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	303	.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	304
				305	Return 1 or 0 depending on whether ch is a printable character.
				306	Nonprintable characters are those characters defined in the Unicode character
				307	database as "Other" or "Separator", excepting the ASCII space (0x20) which is
				308	considered printable. (Note that printable characters in this context are
				309	those which should not be escaped when :func:`repr` is invoked on a string.
				310	It has no bearing on the handling of strings written to :data:`sys.stdout` or
				311	:data:`sys.stderr`.)
				312
				313
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	314	These APIs can be used for fast direct character conversions:
				315
				316
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	317	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	318
				319	Return the character ch converted to lower case.
				320
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	321	.. deprecated:: 3.3
				322	This function uses simple case mappings.
				323
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	324
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	325	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	326
				327	Return the character ch converted to upper case.
				328
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	329	.. deprecated:: 3.3
				330	This function uses simple case mappings.
				331
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	332
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	333	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	334
				335	Return the character ch converted to title case.
				336
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	337	.. deprecated:: 3.3
				338	This function uses simple case mappings.
				339
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	340
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	341	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	342
				343	Return the character ch converted to a decimal positive integer. Return
				344	``-1`` if this is not possible. This macro does not raise exceptions.
				345
				346
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	347	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	348
				349	Return the character ch converted to a single digit integer. Return ``-1`` if
				350	this is not possible. This macro does not raise exceptions.
				351
				352
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	353	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	354
				355	Return the character ch converted to a double. Return ``-1.0`` if this is not
				356	possible. This macro does not raise exceptions.
				357
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	358
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	359	These APIs can be used to work with surrogates:
				360
				361	.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
				362
				363	Check if ch is a surrogate (``0xD800 <= ch <= 0xDFFF``).
				364
				365	.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
				366
				367	Check if ch is an high surrogate (``0xD800 <= ch <= 0xDBFF``).
				368
				369	.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
				370
				371	Check if ch is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
				372
				373	.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
				374
				375	Join two surrogate characters and return a single Py_UCS4 value.
				376	high and low are respectively the leading and trailing surrogates in a
				377	surrogate pair.
				378
				379
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	380	Creating and accessing Unicode strings
				381	""""""""""""""""""""""""""""""""""""""
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	382
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	383	To create Unicode objects and access their basic sequence properties, use these
				384	APIs:
				385
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	386	.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	387
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	388	Create a new Unicode object. maxchar should be the true maximum code point
				389	to be placed in the string. As an approximation, it can be rounded up to the
				390	nearest value in the sequence 127, 255, 65535, 1114111.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	391
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	392	This is the recommended way to allocate a new Unicode object. Objects
				393	created using this function are not resizable.
				394
				395	.. versionadded:: 3.3
				396
				397
				398	.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
				399	Py_ssize_t size)
				400
				401	Create a new Unicode object with the given kind (possible values are
				402	:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
				403	:c:func:`PyUnicode_KIND`). The buffer must point to an array of size
				404	units of 1, 2 or 4 bytes per character, as given by the kind.
				405
				406	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	407
				408
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	409	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	410
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	411	Create a Unicode object from the char buffer u. The bytes will be
				412	interpreted as being UTF-8 encoded. The buffer is copied into the new
				413	object. If the buffer is not NULL, the return value might be a shared
				414	object, i.e. modification of the data is not allowed.
				415
				416	If u is NULL, this function behaves like :c:func:`PyUnicode_FromUnicode`
				417	with the buffer set to NULL. This usage is deprecated in favor of
				418	:c:func:`PyUnicode_New`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	419
				420
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	421	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	422
				423	Create a Unicode object from an UTF-8 encoded null-terminated char buffer
				424	u.
				425
				426
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	427	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	428
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	429	Take a C :c:func:`printf`\ -style format string and a variable number of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	430	arguments, calculate the size of the resulting Python unicode string and return
				431	a string with the values formatted into it. The variable arguments must be C
				432	types and must correspond exactly to the format characters in the format
Victor Stinner	1205f27	2010-09-11 00:54:47 +0000	[diff] [blame]	433	ASCII-encoded string. The following format characters are allowed:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	434
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	435	.. % This should be exactly the same as the table in PyErr_Format.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	436	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				437	.. % because not all compilers support the %z width modifier -- we fake it
				438	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	439	.. % Similar comments apply to the %ll width modifier and
				440	.. % PY_FORMAT_LONG_LONG.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	441
Georg Brandl	44ea77b	2013-03-28 13:28:44 +0100	[diff] [blame]	442	.. tabularcolumns:: \|l\|l\|L\|
				443
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	444	+-------------------+---------------------+--------------------------------+
				445	\| Format Characters \| Type \| Comment \|
				446	+===================+=====================+================================+
				447	\| :attr:`%%` \| n/a \| The literal % character. \|
				448	+-------------------+---------------------+--------------------------------+
				449	\| :attr:`%c` \| int \| A single character, \|
				450	\| \| \| represented as an C int. \|
				451	+-------------------+---------------------+--------------------------------+
				452	\| :attr:`%d` \| int \| Exactly equivalent to \|
				453	\| \| \| ``printf("%d")``. \|
				454	+-------------------+---------------------+--------------------------------+
				455	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				456	\| \| \| ``printf("%u")``. \|
				457	+-------------------+---------------------+--------------------------------+
				458	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				459	\| \| \| ``printf("%ld")``. \|
				460	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	461	\| :attr:`%li` \| long \| Exactly equivalent to \|
				462	\| \| \| ``printf("%li")``. \|
				463	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	464	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				465	\| \| \| ``printf("%lu")``. \|
				466	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	467	\| :attr:`%lld` \| long long \| Exactly equivalent to \|
				468	\| \| \| ``printf("%lld")``. \|
				469	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	470	\| :attr:`%lli` \| long long \| Exactly equivalent to \|
				471	\| \| \| ``printf("%lli")``. \|
				472	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	473	\| :attr:`%llu` \| unsigned long long \| Exactly equivalent to \|
				474	\| \| \| ``printf("%llu")``. \|
				475	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	476	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				477	\| \| \| ``printf("%zd")``. \|
				478	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	479	\| :attr:`%zi` \| Py_ssize_t \| Exactly equivalent to \|
				480	\| \| \| ``printf("%zi")``. \|
				481	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	482	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				483	\| \| \| ``printf("%zu")``. \|
				484	+-------------------+---------------------+--------------------------------+
				485	\| :attr:`%i` \| int \| Exactly equivalent to \|
				486	\| \| \| ``printf("%i")``. \|
				487	+-------------------+---------------------+--------------------------------+
				488	\| :attr:`%x` \| int \| Exactly equivalent to \|
				489	\| \| \| ``printf("%x")``. \|
				490	+-------------------+---------------------+--------------------------------+
				491	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				492	\| \| \| array. \|
				493	+-------------------+---------------------+--------------------------------+
				494	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				495	\| \| \| pointer. Mostly equivalent to \|
				496	\| \| \| ``printf("%p")`` except that \|
				497	\| \| \| it is guaranteed to start with \|
				498	\| \| \| the literal ``0x`` regardless \|
				499	\| \| \| of what the platform's \|
				500	\| \| \| ``printf`` yields. \|
				501	+-------------------+---------------------+--------------------------------+
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	502	\| :attr:`%A` \| PyObject\* \| The result of calling \|
				503	\| \| \| :func:`ascii`. \|
				504	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	505	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				506	+-------------------+---------------------+--------------------------------+
				507	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				508	\| \| \| NULL) and a null-terminated \|
				509	\| \| \| C character array as a second \|
				510	\| \| \| parameter (which will be used, \|
				511	\| \| \| if the first parameter is \|
				512	\| \| \| NULL). \|
				513	+-------------------+---------------------+--------------------------------+
				514	\| :attr:`%S` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	515	\| \| \| :c:func:`PyObject_Str`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	516	+-------------------+---------------------+--------------------------------+
				517	\| :attr:`%R` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	518	\| \| \| :c:func:`PyObject_Repr`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	519	+-------------------+---------------------+--------------------------------+
				520
				521	An unrecognized format character causes all the rest of the format string to be
				522	copied as-is to the result string, and any extra arguments discarded.
				523
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	524	.. note::
				525
				526	The `"%lld"` and `"%llu"` format specifiers are only available
Georg Brandl	ef871f6	2010-03-12 10:06:40 +0000	[diff] [blame]	527	when :const:`HAVE_LONG_LONG` is defined.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	528
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	529	.. note::
				530	The width formatter unit is number of characters rather than bytes.
				531	The precision formatter unit is number of bytes for ``"%s"`` and
				532	``"%V"`` (if the ``PyObject*`` argument is NULL), and a number of
				533	characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
				534	(if the ``PyObject*`` argument is not NULL).
				535
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	536	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	537	Support for ``"%lld"`` and ``"%llu"`` added.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	538
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	539	.. versionchanged:: 3.3
				540	Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
				541
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	542	.. versionchanged:: 3.4
				543	Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
				544	``"%V"``, ``"%S"``, ``"%R"`` added.
				545
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	546
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	547	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	548
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	549	Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	550	arguments.
				551
Alexander Belopolsky	942af5a	2010-12-04 03:38:46 +0000	[diff] [blame]	552
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	553	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
				554	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	555
				556	Coerce an encoded object obj to an Unicode object and return a reference with
				557	incremented refcount.
				558
Serhiy Storchaka	b757c83	2014-12-05 22:25:22 +0200	[diff] [blame]	559	:class:`bytes`, :class:`bytearray` and other
				560	:term:`bytes-like objects <bytes-like object>`
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	561	are decoded according to the given encoding and using the error handling
				562	defined by errors. Both can be NULL to have the interface use the default
Georg Brandl	952867a	2010-06-27 10:17:12 +0000	[diff] [blame]	563	values (see the next section for details).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	564
				565	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				566	set.
				567
				568	The API returns NULL if there was an error. The caller is responsible for
				569	decref'ing the returned objects.
				570
				571
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	572	.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
				573
				574	Return the length of the Unicode object, in code points.
				575
				576	.. versionadded:: 3.3
				577
				578
				579	.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
Serhiy Storchaka	cdd0279	2013-08-08 16:47:43 +0300	[diff] [blame]	580	PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	581
				582	Copy characters from one Unicode object into another. This function performs
				583	character conversion when necessary and falls back to :c:func:`memcpy` if
				584	possible. Returns ``-1`` and sets an exception on error, otherwise returns
				585	``0``.
				586
				587	.. versionadded:: 3.3
				588
				589
Victor Stinner	606e19d	2012-01-04 03:59:16 +0100	[diff] [blame]	590	.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
Victor Stinner	3fe5531	2012-01-04 00:33:50 +0100	[diff] [blame]	591	Py_ssize_t length, Py_UCS4 fill_char)
				592
				593	Fill a string with a character: write fill_char into
				594	``unicode[start:start+length]``.
				595
				596	Fail if fill_char is bigger than the string maximum character, or if the
				597	string has more than 1 reference.
				598
				599	Return the number of written character, or return ``-1`` and raise an
				600	exception on error.
				601
				602	.. versionadded:: 3.3
				603
				604
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	605	.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
				606	Py_UCS4 character)
				607
				608	Write a character to a string. The string must have been created through
				609	:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
				610	the string must not be shared, or have been hashed yet.
				611
				612	This function checks that unicode is a Unicode object, that the index is
				613	not out of bounds, and that the object can be modified safely (i.e. that it
				614	its reference count is one), in contrast to the macro version
				615	:c:func:`PyUnicode_WRITE_CHAR`.
				616
				617	.. versionadded:: 3.3
				618
				619
				620	.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
				621
				622	Read a character from a string. This function checks that unicode is a
				623	Unicode object and the index is not out of bounds, in contrast to the macro
				624	version :c:func:`PyUnicode_READ_CHAR`.
				625
				626	.. versionadded:: 3.3
				627
				628
				629	.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
				630	Py_ssize_t end)
				631
				632	Return a substring of str, from character index start (included) to
				633	character index end (excluded). Negative indices are not supported.
				634
				635	.. versionadded:: 3.3
				636
				637
				638	.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject u, Py_UCS4 buffer, \
				639	Py_ssize_t buflen, int copy_null)
				640
				641	Copy the string u into a UCS4 buffer, including a null character, if
				642	copy_null is set. Returns NULL and sets an exception on error (in
				643	particular, a :exc:`ValueError` if buflen is smaller than the length of
				644	u). buffer is returned on success.
				645
				646	.. versionadded:: 3.3
				647
				648
				649	.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
				650
				651	Copy the string u into a new UCS4 buffer that is allocated using
				652	:c:func:`PyMem_Malloc`. If this fails, NULL is returned with a
				653	:exc:`MemoryError` set.
				654
				655	.. versionadded:: 3.3
				656
				657
				658	Deprecated Py_UNICODE APIs
				659	""""""""""""""""""""""""""
				660
				661	.. deprecated-removed:: 3.3 4.0
				662
				663	These API functions are deprecated with the implementation of :pep:`393`.
				664	Extension modules can continue using them, as they will not be removed in Python
				665	3.x, but need to be aware that their use can now cause performance and memory hits.
				666
				667
				668	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				669
				670	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
				671	may be NULL which causes the contents to be undefined. It is the user's
				672	responsibility to fill in the needed data. The buffer is copied into the new
				673	object.
				674
				675	If the buffer is not NULL, the return value might be a shared object.
				676	Therefore, modification of the resulting Unicode object is only allowed when
				677	u is NULL.
				678
				679	If the buffer is NULL, :c:func:`PyUnicode_READY` must be called once the
				680	string content has been filled before using any of the access macros such as
				681	:c:func:`PyUnicode_KIND`.
				682
				683	Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
				684	:c:func:`PyUnicode_New`.
				685
				686
				687	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				688
				689	Return a read-only pointer to the Unicode object's internal
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	690	:c:type:`Py_UNICODE` buffer, or NULL on error. This will create the
				691	:c:type:`Py_UNICODE*` representation of the object if it is not yet
				692	available. Note that the resulting :c:type:`Py_UNICODE` string may contain
				693	embedded null characters, which would cause the string to be truncated when
				694	used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	695
				696	Please migrate to using :c:func:`PyUnicode_AsUCS4`,
				697	:c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
				698	APIs.
				699
				700
				701	.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
				702
				703	Create a Unicode object by replacing all decimal digits in
				704	:c:type:`Py_UNICODE` buffer of the given size by ASCII digits 0--9
				705	according to their decimal value. Return NULL if an exception occurs.
				706
				707
				708	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject unicode, Py_ssize_t size)
				709
				710	Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	711	array length in size. Note that the resulting :c:type:`Py_UNICODE*` string
				712	may contain embedded null characters, which would cause the string to be
				713	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	714
				715	.. versionadded:: 3.3
				716
				717
				718	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
				719
				720	Create a copy of a Unicode string ending with a nul character. Return NULL
				721	and raise a :exc:`MemoryError` exception on memory allocation failure,
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	722	otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
				723	the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
				724	contain embedded null characters, which would cause the string to be
				725	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	726
				727	.. versionadded:: 3.2
				728
				729	Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
				730
				731
				732	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				733
				734	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				735	code units (this includes surrogate pairs as 2 units).
				736
				737	Please migrate to using :c:func:`PyUnicode_GetLength`.
				738
				739
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	740	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	741
				742	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				743	throughout the interpreter whenever coercion to Unicode is needed.
				744
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	745
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	746	Locale Encoding
				747	"""""""""""""""
				748
				749	The current locale encoding can be used to decode text from the operating
				750	system.
				751
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	752	.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
				753	Py_ssize_t len, \
				754	const char *errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	755
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	756	Decode a string from the current locale encoding. The supported
				757	error handlers are ``"strict"`` and ``"surrogateescape"``
				758	(:pep:`383`). The decoder uses ``"strict"`` error handler if
Andrew Svetlov	f4c3a18	2012-11-29 15:23:15 +0200	[diff] [blame]	759	errors is ``NULL``. str must end with a null character but
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	760	cannot contain embedded null characters.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	761
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	762	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
				763	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				764	Python startup).
				765
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	766	.. seealso::
				767
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	768	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	769
				770	.. versionadded:: 3.3
				771
				772
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	773	.. c:function:: PyObject* PyUnicode_DecodeLocale(const char str, const char errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	774
				775	Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
				776	length using :c:func:`strlen`.
				777
				778	.. versionadded:: 3.3
				779
				780
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	781	.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject unicode, const char errors)
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	782
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	783	Encode a Unicode object to the current locale encoding. The
				784	supported error handlers are ``"strict"`` and ``"surrogateescape"``
				785	(:pep:`383`). The encoder uses ``"strict"`` error handler if
				786	errors is ``NULL``. Return a :class:`bytes` object. str cannot
				787	contain embedded null characters.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	788
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	789	Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
				790	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				791	Python startup).
				792
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	793	.. seealso::
				794
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	795	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	796
				797	.. versionadded:: 3.3
				798
				799
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	800	File System Encoding
				801	""""""""""""""""""""
				802
				803	To encode and decode file names and other environment strings,
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	804	:c:data:`Py_FileSystemEncoding` should be used as the encoding, and
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	805	``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
				806	encode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	807	used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	808
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	809	.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	810
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	811	ParseTuple converter: encode :class:`str` objects to :class:`bytes` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	812	:c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
				813	result must be a :c:type:`PyBytesObject*` which must be released when it is
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	814	no longer used.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	815
				816	.. versionadded:: 3.1
				817
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	818
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	819	To decode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	820	used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	821
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	822	.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	823
				824	ParseTuple converter: decode :class:`bytes` objects to :class:`str` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	825	:c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str` objects are output
				826	as-is. result must be a :c:type:`PyUnicodeObject*` which must be released
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	827	when it is no longer used.
				828
				829	.. versionadded:: 3.2
				830
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	831
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	832	.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	833
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	834	Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	835	``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	836
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	837	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				838	locale encoding.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	839
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	840	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				841	locale encoding and cannot be modified later. If you need to decode a string
				842	from the current locale encoding, use
				843	:c:func:`PyUnicode_DecodeLocaleAndSize`.
				844
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	845	.. seealso::
				846
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	847	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	848
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	849	.. versionchanged:: 3.2
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	850	Use ``"strict"`` error handler on Windows.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	851
				852
				853	.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
				854
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	855	Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	856	and the ``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	857
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	858	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				859	locale encoding.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	860
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	861	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	862
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	863	.. versionchanged:: 3.2
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	864	Use ``"strict"`` error handler on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	865
				866
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	867	.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	868
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	869	Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	870	``"surrogateescape"`` error handler, or ``"strict"`` on Windows, and return
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	871	:class:`bytes`. Note that the resulting :class:`bytes` object may contain
				872	null bytes.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	873
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	874	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				875	locale encoding.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	876
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	877	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				878	locale encoding and cannot be modified later. If you need to encode a string
				879	to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
				880
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	881	.. seealso::
				882
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	883	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	884
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	885	.. versionadded:: 3.2
				886
				887
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	888	wchar_t Support
				889	"""""""""""""""
				890
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	891	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	892
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	893	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	894
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	895	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	896	Passing -1 as the size indicates that the function must itself compute the length,
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	897	using wcslen.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	898	Return NULL on failure.
				899
				900
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	901	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	902
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	903	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				904	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
				905	0-termination character). Return the number of :c:type:`wchar_t` characters
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	906	copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	907	string may or may not be 0-terminated. It is the responsibility of the caller
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	908	to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	909	required by the application. Also, note that the :c:type:`wchar_t*` string
				910	might contain null characters, which would cause the string to be truncated
				911	when used with most C functions.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	912
				913
Victor Stinner	beb4135b	2010-10-07 01:02:42 +0000	[diff] [blame]	914	.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject unicode, Py_ssize_t size)
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	915
				916	Convert the Unicode object to a wide character string. The output string
				917	always ends with a nul character. If size is not NULL, write the number
Victor Stinner	1c24bd0	2010-10-02 11:03:13 +0000	[diff] [blame]	918	of wide characters (excluding the trailing 0-termination character) into
				919	\size*.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	920
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	921	Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
				922	:c:func:`PyMem_Free` to free it) on success. On error, returns NULL,
				923	\size* is undefined and raises a :exc:`MemoryError`. Note that the
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	924	resulting :c:type:`wchar_t` string might contain null characters, which
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	925	would cause the string to be truncated when used with most C functions.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	926
				927	.. versionadded:: 3.2
				928
				929
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	930	UCS4 Support
				931	""""""""""""
				932
				933	.. versionadded:: 3.3
				934
				935	.. XXX are these meant to be public?
				936
				937	.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
				938	Py_UCS4* Py_UCS4_strcpy(Py_UCS4 s1, const Py_UCS4 s2)
				939	Py_UCS4* Py_UCS4_strncpy(Py_UCS4 s1, const Py_UCS4 s2, size_t n)
				940	Py_UCS4* Py_UCS4_strcat(Py_UCS4 s1, const Py_UCS4 s2)
				941	int Py_UCS4_strcmp(const Py_UCS4 s1, const Py_UCS4 s2)
				942	int Py_UCS4_strncmp(const Py_UCS4 s1, const Py_UCS4 s2, size_t n)
Antoine Pitrou	57735a0	2011-10-22 22:08:46 +0200	[diff] [blame]	943	Py_UCS4* Py_UCS4_strchr(const Py_UCS4 *s, Py_UCS4 c)
				944	Py_UCS4* Py_UCS4_strrchr(const Py_UCS4 *s, Py_UCS4 c)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	945
				946	These utility functions work on strings of :c:type:`Py_UCS4` characters and
				947	otherwise behave like the C standard library functions with the same name.
				948
				949
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	950	.. _builtincodecs:
				951
				952	Built-in Codecs
				953	^^^^^^^^^^^^^^^
				954
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	955	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	956	these codecs are directly usable via the following functions.
				957
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	958	Many of the following APIs take two arguments encoding and errors, and they
				959	have the same semantics as the ones of the built-in :func:`str` string object
				960	constructor.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	961
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	962	Setting encoding to NULL causes the default encoding to be used
				963	which is ASCII. The file system calls should use
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	964	:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
				965	variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	966	variable should be treated as read-only: on some systems, it will be a
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	967	pointer to a static string, on others, it will change at run-time
				968	(such as when the application invokes setlocale).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	969
				970	Error handling is set by errors which may also be set to NULL meaning to use
				971	the default handling defined for the codec. Default error handling for all
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	972	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	973
				974	The codecs all use a similar interface. Only deviation from the following
				975	generic ones are documented for simplicity.
				976
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	977
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	978	Generic Codecs
				979	""""""""""""""
				980
				981	These are the generic codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	982
				983
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	984	.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
				985	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	986
				987	Create a Unicode object by decoding size bytes of the encoded string s.
				988	encoding and errors have the same meaning as the parameters of the same name
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	989	in the :func:`str` built-in function. The codec to be used is looked up
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	990	using the Python codec registry. Return NULL if an exception was raised by
				991	the codec.
				992
				993
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	994	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
				995	const char encoding, const char errors)
				996
				997	Encode a Unicode object and return the result as Python bytes object.
				998	encoding and errors have the same meaning as the parameters of the same
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	999	name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1000	using the Python codec registry. Return NULL if an exception was raised by
				1001	the codec.
				1002
				1003
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1004	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
				1005	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1006
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1007	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1008	bytes object. encoding and errors have the same meaning as the
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1009	parameters of the same name in the Unicode :meth:`~str.encode` method. The codec
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1010	to be used is looked up using the Python codec registry. Return NULL if an
				1011	exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1012
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1013	.. deprecated-removed:: 3.3 4.0
				1014	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1015	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1016
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1017
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1018	UTF-8 Codecs
				1019	""""""""""""
				1020
				1021	These are the UTF-8 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1022
				1023
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1024	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1025
				1026	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				1027	s. Return NULL if an exception was raised by the codec.
				1028
				1029
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1030	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
				1031	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1032
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1033	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1034	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				1035	treated as an error. Those bytes will not be decoded and the number of bytes
				1036	that have been decoded will be stored in consumed.
				1037
				1038
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1039	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1040
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1041	Encode a Unicode object using UTF-8 and return the result as Python bytes
				1042	object. Error handling is "strict". Return NULL if an exception was
				1043	raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1044
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1045
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1046	.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject unicode, Py_ssize_t size)
				1047
				1048	Return a pointer to the default encoding (UTF-8) of the Unicode object, and
				1049	store the size of the encoded representation (in bytes) in size. size
				1050	can be NULL, in this case no size will be stored.
				1051
				1052	In the case of an error, NULL is returned with an exception set and no
				1053	size is stored.
				1054
				1055	This caches the UTF-8 representation of the string in the Unicode object, and
				1056	subsequent calls will return a pointer to the same buffer. The caller is not
				1057	responsible for deallocating the buffer.
				1058
				1059	.. versionadded:: 3.3
				1060
				1061
				1062	.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
				1063
				1064	As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
				1065
				1066	.. versionadded:: 3.3
				1067
				1068
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1069	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				1070
				1071	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and
				1072	return a Python bytes object. Return NULL if an exception was raised by
				1073	the codec.
				1074
				1075	.. deprecated-removed:: 3.3 4.0
				1076	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1077	:c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
				1078
				1079
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1080	UTF-32 Codecs
				1081	"""""""""""""
				1082
				1083	These are the UTF-32 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1084
				1085
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1086	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
				1087	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1088
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1089	Decode size bytes from a UTF-32 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1090	corresponding Unicode object. errors (if non-NULL) defines the error
				1091	handling. It defaults to "strict".
				1092
				1093	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1094	order::
				1095
				1096	*byteorder == -1: little endian
				1097	*byteorder == 0: native order
				1098	*byteorder == 1: big endian
				1099
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1100	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				1101	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1102	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1103	``1``, any byte order mark is copied to the output.
				1104
				1105	After completion, \byteorder* is set to the current byte order at the end
				1106	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1107
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1108	If byteorder is NULL, the codec starts in native order mode.
				1109
				1110	Return NULL if an exception was raised by the codec.
				1111
				1112
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1113	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
				1114	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1115
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1116	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				1117	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1118	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				1119	by four) as an error. Those bytes will not be decoded and the number of bytes
				1120	that have been decoded will be stored in consumed.
				1121
				1122
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1123	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				1124
				1125	Return a Python byte string using the UTF-32 encoding in native byte
				1126	order. The string always starts with a BOM mark. Error handling is "strict".
				1127	Return NULL if an exception was raised by the codec.
				1128
				1129
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1130	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
				1131	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1132
				1133	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1134	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1135
				1136	byteorder == -1: little endian
				1137	byteorder == 0: native byte order (writes a BOM mark)
				1138	byteorder == 1: big endian
				1139
				1140	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1141	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1142
				1143	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
Georg Brandl	3be472b	2015-01-14 08:26:30 +0100	[diff] [blame]	1144	as a single code point.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1145
				1146	Return NULL if an exception was raised by the codec.
				1147
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1148	.. deprecated-removed:: 3.3 4.0
				1149	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1150	:c:func:`PyUnicode_AsUTF32String`.
				1151
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1152
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1153	UTF-16 Codecs
				1154	"""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1155
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1156	These are the UTF-16 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1157
				1158
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1159	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
				1160	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1161
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1162	Decode size bytes from a UTF-16 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1163	corresponding Unicode object. errors (if non-NULL) defines the error
				1164	handling. It defaults to "strict".
				1165
				1166	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1167	order::
				1168
				1169	*byteorder == -1: little endian
				1170	*byteorder == 0: native order
				1171	*byteorder == 1: big endian
				1172
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1173	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				1174	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1175	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1176	``1``, any byte order mark is copied to the output (where it will result in
				1177	either a ``\ufeff`` or a ``\ufffe`` character).
				1178
				1179	After completion, \byteorder* is set to the current byte order at the end
				1180	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1181
				1182	If byteorder is NULL, the codec starts in native order mode.
				1183
				1184	Return NULL if an exception was raised by the codec.
				1185
				1186
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1187	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
				1188	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1189
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1190	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				1191	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1192	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				1193	split surrogate pair) as an error. Those bytes will not be decoded and the
				1194	number of bytes that have been decoded will be stored in consumed.
				1195
				1196
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1197	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				1198
				1199	Return a Python byte string using the UTF-16 encoding in native byte
				1200	order. The string always starts with a BOM mark. Error handling is "strict".
				1201	Return NULL if an exception was raised by the codec.
				1202
				1203
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1204	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
				1205	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1206
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1207	Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1208	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1209
				1210	byteorder == -1: little endian
				1211	byteorder == 0: native byte order (writes a BOM mark)
				1212	byteorder == 1: big endian
				1213
				1214	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1215	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1216
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1217	If Py_UNICODE_WIDE is defined, a single :c:type:`Py_UNICODE` value may get
				1218	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1219	values is interpreted as an UCS-2 character.
				1220
				1221	Return NULL if an exception was raised by the codec.
				1222
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1223	.. deprecated-removed:: 3.3 4.0
				1224	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1225	:c:func:`PyUnicode_AsUTF16String`.
				1226
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1227
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1228	UTF-7 Codecs
				1229	""""""""""""
				1230
				1231	These are the UTF-7 codec APIs:
				1232
				1233
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1234	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1235
				1236	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				1237	s. Return NULL if an exception was raised by the codec.
				1238
				1239
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1240	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
				1241	const char errors, Py_ssize_t consumed)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1242
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1243	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1244	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				1245	be treated as an error. Those bytes will not be decoded and the number of
				1246	bytes that have been decoded will be stored in consumed.
				1247
				1248
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1249	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
				1250	int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1251
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1252	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1253	return a Python bytes object. Return NULL if an exception was raised by
				1254	the codec.
				1255
				1256	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				1257	special meaning) will be encoded in base-64. If base64WhiteSpace is
				1258	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				1259	Python "utf-7" codec.
				1260
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1261	.. deprecated-removed:: 3.3 4.0
				1262	Part of the old-style :c:type:`Py_UNICODE` API.
				1263
				1264	.. XXX replace with what?
				1265
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1266
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1267	Unicode-Escape Codecs
				1268	"""""""""""""""""""""
				1269
				1270	These are the "Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1271
				1272
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1273	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
				1274	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1275
				1276	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				1277	string s. Return NULL if an exception was raised by the codec.
				1278
				1279
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1280	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				1281
				1282	Encode a Unicode object using Unicode-Escape and return the result as Python
				1283	string object. Error handling is "strict". Return NULL if an exception was
				1284	raised by the codec.
				1285
				1286
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1287	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1288
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1289	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1290	return a Python string object. Return NULL if an exception was raised by the
				1291	codec.
				1292
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1293	.. deprecated-removed:: 3.3 4.0
				1294	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1295	:c:func:`PyUnicode_AsUnicodeEscapeString`.
				1296
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1297
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1298	Raw-Unicode-Escape Codecs
				1299	"""""""""""""""""""""""""
				1300
				1301	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1302
				1303
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1304	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
				1305	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1306
				1307	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				1308	encoded string s. Return NULL if an exception was raised by the codec.
				1309
				1310
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1311	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				1312
				1313	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				1314	Python string object. Error handling is "strict". Return NULL if an exception
				1315	was raised by the codec.
				1316
				1317
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1318	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
				1319	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1320
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1321	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1322	and return a Python string object. Return NULL if an exception was raised by
				1323	the codec.
				1324
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1325	.. deprecated-removed:: 3.3 4.0
				1326	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1327	:c:func:`PyUnicode_AsRawUnicodeEscapeString`.
				1328
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1329
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1330	Latin-1 Codecs
				1331	""""""""""""""
				1332
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1333	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				1334	ordinals and only these are accepted by the codecs during encoding.
				1335
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1336
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1337	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1338
				1339	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				1340	s. Return NULL if an exception was raised by the codec.
				1341
				1342
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1343	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				1344
				1345	Encode a Unicode object using Latin-1 and return the result as Python bytes
				1346	object. Error handling is "strict". Return NULL if an exception was
				1347	raised by the codec.
				1348
				1349
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1350	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1351
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1352	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1353	return a Python bytes object. Return NULL if an exception was raised by
				1354	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1355
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1356	.. deprecated-removed:: 3.3 4.0
				1357	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1358	:c:func:`PyUnicode_AsLatin1String`.
				1359
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1360
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1361	ASCII Codecs
				1362	""""""""""""
				1363
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1364	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				1365	codes generate errors.
				1366
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1367
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1368	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1369
				1370	Create a Unicode object by decoding size bytes of the ASCII encoded string
				1371	s. Return NULL if an exception was raised by the codec.
				1372
				1373
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1374	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				1375
				1376	Encode a Unicode object using ASCII and return the result as Python bytes
				1377	object. Error handling is "strict". Return NULL if an exception was
				1378	raised by the codec.
				1379
				1380
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1381	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1382
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1383	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1384	return a Python bytes object. Return NULL if an exception was raised by
				1385	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1386
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1387	.. deprecated-removed:: 3.3 4.0
				1388	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1389	:c:func:`PyUnicode_AsASCIIString`.
				1390
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1391
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1392	Character Map Codecs
				1393	""""""""""""""""""""
				1394
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1395	This codec is special in that it can be used to implement many different codecs
				1396	(and this is in fact what was done to obtain most of the standard codecs
				1397	included in the :mod:`encodings` package). The codec uses mapping to encode and
				1398	decode characters.
				1399
				1400	Decoding mappings must map single string characters to single Unicode
				1401	characters, integers (which are then interpreted as Unicode ordinals) or None
				1402	(meaning "undefined mapping" and causing an error).
				1403
				1404	Encoding mappings must map single Unicode characters to single string
				1405	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				1406	(meaning "undefined mapping" and causing an error).
				1407
				1408	The mapping objects provided must only support the __getitem__ mapping
				1409	interface.
				1410
				1411	If a character lookup fails with a LookupError, the character is copied as-is
				1412	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				1413	resp. Because of this, mappings only need to contain those mappings which map
				1414	characters to different code points.
				1415
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1416	These are the mapping codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1417
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1418	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
				1419	PyObject mapping, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1420
				1421	Create a Unicode object by decoding size bytes of the encoded string s using
				1422	the given mapping object. Return NULL if an exception was raised by the
				1423	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				1424	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				1425	Byte values greater that the length of the string and U+FFFE "characters" are
				1426	treated as "undefined mapping".
				1427
				1428
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1429	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1430
				1431	Encode a Unicode object using the given mapping object and return the result
				1432	as Python string object. Error handling is "strict". Return NULL if an
				1433	exception was raised by the codec.
				1434
				1435	The following codec API is special in that maps Unicode to Unicode.
				1436
				1437
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1438	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1439	PyObject table, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1440
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1441	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1442	character mapping table to it and return the resulting Unicode object. Return
				1443	NULL when an exception was raised by the codec.
				1444
				1445	The mapping table must map Unicode ordinal integers to Unicode ordinal
				1446	integers or None (causing deletion of the character).
				1447
				1448	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1449	and sequences work well. Unmapped character ordinals (ones which cause a
				1450	:exc:`LookupError`) are left untouched and are copied as-is.
				1451
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1452	.. deprecated-removed:: 3.3 4.0
				1453	Part of the old-style :c:type:`Py_UNICODE` API.
				1454
				1455	.. XXX replace with what?
Jeroen Ruigrok van der Werven	47a7d70	2009-04-27 05:43:17 +0000	[diff] [blame]	1456
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1457
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1458	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1459	PyObject mapping, const char errors)
				1460
				1461	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
				1462	mapping object and return a Python string object. Return NULL if an
				1463	exception was raised by the codec.
				1464
				1465	.. deprecated-removed:: 3.3 4.0
				1466	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1467	:c:func:`PyUnicode_AsCharmapString`.
				1468
				1469
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1470	MBCS codecs for Windows
				1471	"""""""""""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1472
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1473	These are the MBCS codec APIs. They are currently only available on Windows and
				1474	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				1475	DBCS) is a class of encodings, not just one. The target encoding is defined by
				1476	the user settings on the machine running the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1477
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1478	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1479
				1480	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				1481	Return NULL if an exception was raised by the codec.
				1482
				1483
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1484	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
				1485	const char errors, int consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1486
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1487	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				1488	consumed is not NULL, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1489	trailing lead byte and the number of bytes that have been decoded will be stored
				1490	in consumed.
				1491
				1492
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1493	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				1494
				1495	Encode a Unicode object using MBCS and return the result as Python bytes
				1496	object. Error handling is "strict". Return NULL if an exception was
				1497	raised by the codec.
				1498
				1499
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1500	.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject unicode, const char errors)
				1501
				1502	Encode the Unicode object using the specified code page and return a Python
				1503	bytes object. Return NULL if an exception was raised by the codec. Use
				1504	:c:data:`CP_ACP` code page to get the MBCS encoder.
				1505
				1506	.. versionadded:: 3.3
				1507
				1508
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1509	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1510
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1511	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1512	a Python bytes object. Return NULL if an exception was raised by the
				1513	codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1514
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1515	.. deprecated-removed:: 3.3 4.0
				1516	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1517	:c:func:`PyUnicode_AsMBCSString` or :c:func:`PyUnicode_EncodeCodePage`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1518
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1519
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1520	Methods & Slots
				1521	"""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1522
				1523
				1524	.. _unicodemethodsandslots:
				1525
				1526	Methods and Slot Functions
				1527	^^^^^^^^^^^^^^^^^^^^^^^^^^
				1528
				1529	The following APIs are capable of handling Unicode objects and strings on input
				1530	(we refer to them as strings in the descriptions) and return Unicode objects or
				1531	integers as appropriate.
				1532
				1533	They all return NULL or ``-1`` if an exception occurs.
				1534
				1535
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1536	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1537
				1538	Concat two strings giving a new Unicode string.
				1539
				1540
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1541	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1542
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1543	Split a string giving a list of Unicode strings. If sep is NULL, splitting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1544	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1545	separator. At most maxsplit splits will be done. If negative, no limit is
				1546	set. Separators are not included in the resulting list.
				1547
				1548
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1549	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1550
				1551	Split a Unicode string at line breaks, returning a list of Unicode strings.
				1552	CRLF is considered to be one line break. If keepend is 0, the Line break
				1553	characters are not included in the resulting strings.
				1554
				1555
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1556	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, \
				1557	const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1558
				1559	Translate a string by applying a character mapping table to it and return the
				1560	resulting Unicode object.
				1561
				1562	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1563	or None (causing deletion of the character).
				1564
				1565	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1566	and sequences work well. Unmapped character ordinals (ones which cause a
				1567	:exc:`LookupError`) are left untouched and are copied as-is.
				1568
				1569	errors has the usual meaning for codecs. It may be NULL which indicates to
				1570	use the default error handling.
				1571
				1572
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1573	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1574
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1575	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1576	Unicode string.
				1577
				1578
Victor Stinner	13d3aa5	2014-10-09 11:11:25 +0200	[diff] [blame]	1579	.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject str, PyObject substr, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1580	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1581
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1582	Return 1 if substr matches ``str[start:end]`` at the given tail end
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1583	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				1584	0 otherwise. Return ``-1`` if an error occurred.
				1585
				1586
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1587	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, \
				1588	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1589
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1590	Return the first position of substr in ``str[start:end]`` using the given
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1591	direction (direction == 1 means to do a forward search, direction == -1 a
				1592	backward search). The return value is the index of the first match; a value of
				1593	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1594	occurred and an exception has been set.
				1595
				1596
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1597	.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
				1598	Py_ssize_t start, Py_ssize_t end, int direction)
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1599
				1600	Return the first position of the character ch in ``str[start:end]`` using
				1601	the given direction (direction == 1 means to do a forward search,
				1602	direction == -1 a backward search). The return value is the index of the
				1603	first match; a value of ``-1`` indicates that no match was found, and ``-2``
				1604	indicates that an error occurred and an exception has been set.
				1605
Georg Brandl	ee12f44	2011-09-28 21:51:06 +0200	[diff] [blame]	1606	.. versionadded:: 3.3
				1607
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1608
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1609	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, \
				1610	Py_ssize_t start, Py_ssize_t end)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1611
				1612	Return the number of non-overlapping occurrences of substr in
				1613	``str[start:end]``. Return ``-1`` if an error occurred.
				1614
				1615
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1616	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, \
				1617	PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1618
				1619	Replace at most maxcount occurrences of substr in str with replstr and
				1620	return the resulting Unicode object. maxcount == -1 means replace all
				1621	occurrences.
				1622
				1623
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1624	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1625
				1626	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				1627	respectively.
				1628
				1629
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1630	.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject uni, char string)
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1631
				1632	Compare a unicode object, uni, with string and return -1, 0, 1 for less
Victor Stinner	80e788a	2010-12-28 23:39:51 +0000	[diff] [blame]	1633	than, equal, and greater than, respectively. It is best to pass only
				1634	ASCII-encoded strings, but the function interprets the input string as
Zachary Ware	780b585	2014-06-06 09:13:18 -0500	[diff] [blame]	1635	ISO-8859-1 if it contains non-ASCII characters.
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1636
				1637
Eli Bendersky	0813168	2012-06-03 08:07:47 +0300	[diff] [blame]	1638	.. c:function:: PyObject* PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1639
				1640	Rich compare two unicode strings and return one of the following:
				1641
				1642	* ``NULL`` in case an exception was raised
				1643	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1644	* :const:`Py_NotImplemented` in case the type combination is unknown
				1645
				1646	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				1647	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				1648	with a :exc:`UnicodeDecodeError`.
				1649
				1650	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1651	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1652
				1653
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1654	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1655
				1656	Return a new string object from format and args; this is analogous to
Benjamin Peterson	102488b	2014-07-19 16:34:33 -0700	[diff] [blame]	1657	``format % args``.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1658
				1659
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1660	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1661
				1662	Check whether element is contained in container and return true or false
				1663	accordingly.
				1664
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1665	element has to coerce to a one element Unicode string. ``-1`` is returned
				1666	if there was an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1667
				1668
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1669	.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1670
				1671	Intern the argument \string* in place. The argument must be the address of a
				1672	pointer variable pointing to a Python unicode string object. If there is an
				1673	existing interned string that is the same as \string, it sets \string to
				1674	it (decrementing the reference count of the old string object and incrementing
				1675	the reference count of the interned string object), otherwise it leaves
				1676	\string* alone and interns it (incrementing its reference count).
				1677	(Clarification: even though there is a lot of talk about reference counts, think
				1678	of this function as reference-count-neutral; you own the object after the call
				1679	if and only if you owned it before the call.)
				1680
				1681
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1682	.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1683
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1684	A combination of :c:func:`PyUnicode_FromString` and
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1685	:c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
				1686	object that has been interned, or a new ("owned") reference to an earlier
				1687	interned string object with the same value.