Blame - Doc/c-api/unicode.rst - platform/external/python/cpython3

blob: d86f99a7d91d334bbb5b425fde8ad3f9658f7bc5 [file] [log] [blame]

Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
Antoine Pitrou	fbd4f80	2012-08-11 16:51:50 +0200	[diff] [blame]	8	.. sectionauthor:: Marc-André Lemburg <mal@lemburg.com>
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	9	.. sectionauthor:: Georg Brandl <georg@python.org>
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	10
				11	Unicode Objects
				12	^^^^^^^^^^^^^^^
				13
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	14	Since the implementation of :pep:`393` in Python 3.3, Unicode objects internally
				15	use a variety of representations, in order to allow handling the complete range
				16	of Unicode characters while staying memory efficient. There are special cases
				17	for strings where all code points are below 128, 256, or 65536; otherwise, code
				18	points must be below 1114112 (which is the full Unicode range).
				19
				20	:c:type:`Py_UNICODE*` and UTF-8 representations are created on demand and cached
Antoine Pitrou	b965b39	2011-10-22 22:08:05 +0200	[diff] [blame]	21	in the Unicode object. The :c:type:`Py_UNICODE*` representation is deprecated
				22	and inefficient; it should be avoided in performance- or memory-sensitive
				23	situations.
				24
				25	Due to the transition between the old APIs and the new APIs, unicode objects
				26	can internally be in two states depending on how they were created:
				27
				28	* "canonical" unicode objects are all objects created by a non-deprecated
				29	unicode API. They use the most efficient representation allowed by the
				30	implementation.
				31
				32	* "legacy" unicode objects have been created through one of the deprecated
				33	APIs (typically :c:func:`PyUnicode_FromUnicode`) and only bear the
				34	:c:type:`Py_UNICODE*` representation; you will have to call
				35	:c:func:`PyUnicode_READY` on them before calling any other API.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	36
				37
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	38	Unicode Type
				39	""""""""""""
				40
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	41	These are the basic Unicode object types used for the Unicode implementation in
				42	Python:
				43
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	44	.. c:type:: Py_UCS4
				45	Py_UCS2
				46	Py_UCS1
				47
				48	These types are typedefs for unsigned integer types wide enough to contain
				49	characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with
				50	single Unicode characters, use :c:type:`Py_UCS4`.
				51
				52	.. versionadded:: 3.3
				53
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	54
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	55	.. c:type:: Py_UNICODE
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	56
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	57	This is a typedef of :c:type:`wchar_t`, which is a 16-bit type or 32-bit type
				58	depending on the platform.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	59
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	60	.. versionchanged:: 3.3
				61	In previous versions, this was a 16-bit type or a 32-bit type depending on
				62	whether you selected a "narrow" or "wide" Unicode version of Python at
				63	build time.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	64
				65
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	66	.. c:type:: PyASCIIObject
				67	PyCompactUnicodeObject
				68	PyUnicodeObject
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	69
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	70	These subtypes of :c:type:`PyObject` represent a Python Unicode object. In
				71	almost all cases, they shouldn't be used directly, since all API functions
				72	that deal with Unicode objects take and return :c:type:`PyObject` pointers.
				73
				74	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	75
				76
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	77	.. c:var:: PyTypeObject PyUnicode_Type
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	78
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	79	This instance of :c:type:`PyTypeObject` represents the Python Unicode type. It
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	80	is exposed to Python code as ``str``.
				81
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	82
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	83	The following APIs are really C macros and can be used to do fast checks and to
				84	access internal read-only data of Unicode objects:
				85
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	86	.. c:function:: int PyUnicode_Check(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	87
				88	Return true if the object o is a Unicode object or an instance of a Unicode
				89	subtype.
				90
				91
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	92	.. c:function:: int PyUnicode_CheckExact(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	93
				94	Return true if the object o is a Unicode object, but not an instance of a
				95	subtype.
				96
				97
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	98	.. c:function:: int PyUnicode_READY(PyObject *o)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	99
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	100	Ensure the string object o is in the "canonical" representation. This is
				101	required before using any of the access macros described below.
				102
				103	.. XXX expand on when it is not required
				104
				105	Returns 0 on success and -1 with an exception set on failure, which in
				106	particular happens if memory allocation fails.
				107
				108	.. versionadded:: 3.3
				109
				110
				111	.. c:function:: Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o)
				112
				113	Return the length of the Unicode string, in code points. o has to be a
				114	Unicode object in the "canonical" representation (not checked).
				115
				116	.. versionadded:: 3.3
				117
				118
				119	.. c:function:: Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o)
				120	Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o)
				121	Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o)
				122
				123	Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4
				124	integer types for direct character access. No checks are performed if the
				125	canonical representation has the correct character size; use
Martin v. Löwis	2da16e6	2011-10-07 20:58:00 +0200	[diff] [blame]	126	:c:func:`PyUnicode_KIND` to select the right macro. Make sure
Martin v. Löwis	c47adb0	2011-10-07 20:55:35 +0200	[diff] [blame]	127	:c:func:`PyUnicode_READY` has been called before accessing this.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	128
				129	.. versionadded:: 3.3
				130
				131
Victor Stinner	b4938aa	2011-11-20 18:27:28 +0100	[diff] [blame]	132	.. c:macro:: PyUnicode_WCHAR_KIND
				133	PyUnicode_1BYTE_KIND
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	134	PyUnicode_2BYTE_KIND
				135	PyUnicode_4BYTE_KIND
				136
				137	Return values of the :c:func:`PyUnicode_KIND` macro.
				138
				139	.. versionadded:: 3.3
				140
				141
				142	.. c:function:: int PyUnicode_KIND(PyObject *o)
				143
				144	Return one of the PyUnicode kind constants (see above) that indicate how many
				145	bytes per character this Unicode object uses to store its data. o has to
				146	be a Unicode object in the "canonical" representation (not checked).
				147
				148	.. XXX document "0" return value?
				149
				150	.. versionadded:: 3.3
				151
				152
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	153	.. c:function:: void* PyUnicode_DATA(PyObject *o)
				154
				155	Return a void pointer to the raw unicode buffer. o has to be a Unicode
				156	object in the "canonical" representation (not checked).
				157
				158	.. versionadded:: 3.3
				159
				160
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	161	.. c:function:: void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, \
				162	Py_UCS4 value)
				163
				164	Write into a canonical representation data (as obtained with
				165	:c:func:`PyUnicode_DATA`). This macro does not do any sanity checks and is
				166	intended for usage in loops. The caller should cache the kind value and
				167	data pointer as obtained from other macro calls. index is the index in
				168	the string (starts at 0) and value is the new code point value which should
				169	be written to that location.
				170
				171	.. versionadded:: 3.3
				172
				173
				174	.. c:function:: Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index)
				175
				176	Read a code point from a canonical representation data (as obtained with
				177	:c:func:`PyUnicode_DATA`). No checks or ready calls are performed.
				178
				179	.. versionadded:: 3.3
				180
				181
				182	.. c:function:: Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index)
				183
				184	Read a character from a Unicode object o, which must be in the "canonical"
				185	representation. This is less efficient than :c:func:`PyUnicode_READ` if you
				186	do multiple consecutive reads.
				187
				188	.. versionadded:: 3.3
				189
				190
				191	.. c:function:: PyUnicode_MAX_CHAR_VALUE(PyObject *o)
				192
				193	Return the maximum code point that is suitable for creating another string
				194	based on o, which must be in the "canonical" representation. This is
				195	always an approximation but more efficient than iterating over the string.
				196
				197	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	198
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	199
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	200	.. c:function:: int PyUnicode_ClearFreeList()
Christian Heimes	a156e09	2008-02-16 07:38:31 +0000	[diff] [blame]	201
				202	Clear the free list. Return the total number of freed items.
				203
Alexandre Vassalotti	6d3dfc3	2009-07-29 19:54:39 +0000	[diff] [blame]	204
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	205	.. c:function:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				206
				207	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				208	code units (this includes surrogate pairs as 2 units). o has to be a
				209	Unicode object (not checked).
				210
				211	.. deprecated-removed:: 3.3 4.0
				212	Part of the old-style Unicode API, please migrate to using
				213	:c:func:`PyUnicode_GET_LENGTH`.
				214
				215
				216	.. c:function:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				217
				218	Return the size of the deprecated :c:type:`Py_UNICODE` representation in
				219	bytes. o has to be a Unicode object (not checked).
				220
				221	.. deprecated-removed:: 3.3 4.0
				222	Part of the old-style Unicode API, please migrate to using
				223	:c:func:`PyUnicode_GET_LENGTH`.
				224
				225
				226	.. c:function:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				227	const char* PyUnicode_AS_DATA(PyObject *o)
				228
				229	Return a pointer to a :c:type:`Py_UNICODE` representation of the object. The
				230	``AS_DATA`` form casts the pointer to :c:type:`const char `. o* has to be
				231	a Unicode object (not checked).
				232
				233	.. versionchanged:: 3.3
				234	This macro is now inefficient -- because in many cases the
				235	:c:type:`Py_UNICODE` representation does not exist and needs to be created
				236	-- and can fail (return NULL with an exception set). Try to port the
				237	code to use the new :c:func:`PyUnicode_nBYTE_DATA` macros or use
				238	:c:func:`PyUnicode_WRITE` or :c:func:`PyUnicode_READ`.
				239
				240	.. deprecated-removed:: 3.3 4.0
				241	Part of the old-style Unicode API, please migrate to using the
				242	:c:func:`PyUnicode_nBYTE_DATA` family of macros.
				243
				244
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	245	Unicode Character Properties
				246	""""""""""""""""""""""""""""
				247
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	248	Unicode provides many different character properties. The most often needed ones
				249	are available through these macros which are mapped to C functions depending on
				250	the Python configuration.
				251
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	252
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	253	.. c:function:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	254
				255	Return 1 or 0 depending on whether ch is a whitespace character.
				256
				257
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	258	.. c:function:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	259
				260	Return 1 or 0 depending on whether ch is a lowercase character.
				261
				262
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	263	.. c:function:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	264
				265	Return 1 or 0 depending on whether ch is an uppercase character.
				266
				267
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	268	.. c:function:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	269
				270	Return 1 or 0 depending on whether ch is a titlecase character.
				271
				272
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	273	.. c:function:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	274
				275	Return 1 or 0 depending on whether ch is a linebreak character.
				276
				277
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	278	.. c:function:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	279
				280	Return 1 or 0 depending on whether ch is a decimal character.
				281
				282
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	283	.. c:function:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	284
				285	Return 1 or 0 depending on whether ch is a digit character.
				286
				287
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	288	.. c:function:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	289
				290	Return 1 or 0 depending on whether ch is a numeric character.
				291
				292
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	293	.. c:function:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	294
				295	Return 1 or 0 depending on whether ch is an alphabetic character.
				296
				297
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	298	.. c:function:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	299
				300	Return 1 or 0 depending on whether ch is an alphanumeric character.
				301
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	302
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	303	.. c:function:: int Py_UNICODE_ISPRINTABLE(Py_UNICODE ch)
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	304
				305	Return 1 or 0 depending on whether ch is a printable character.
				306	Nonprintable characters are those characters defined in the Unicode character
				307	database as "Other" or "Separator", excepting the ASCII space (0x20) which is
				308	considered printable. (Note that printable characters in this context are
				309	those which should not be escaped when :func:`repr` is invoked on a string.
				310	It has no bearing on the handling of strings written to :data:`sys.stdout` or
				311	:data:`sys.stderr`.)
				312
				313
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	314	These APIs can be used for fast direct character conversions:
				315
				316
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	317	.. c:function:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	318
				319	Return the character ch converted to lower case.
				320
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	321	.. deprecated:: 3.3
				322	This function uses simple case mappings.
				323
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	324
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	325	.. c:function:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	326
				327	Return the character ch converted to upper case.
				328
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	329	.. deprecated:: 3.3
				330	This function uses simple case mappings.
				331
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	332
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	333	.. c:function:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	334
				335	Return the character ch converted to title case.
				336
Benjamin Peterson	b2bf01d	2012-01-11 18:17:06 -0500	[diff] [blame]	337	.. deprecated:: 3.3
				338	This function uses simple case mappings.
				339
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	340
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	341	.. c:function:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	342
				343	Return the character ch converted to a decimal positive integer. Return
				344	``-1`` if this is not possible. This macro does not raise exceptions.
				345
				346
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	347	.. c:function:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	348
				349	Return the character ch converted to a single digit integer. Return ``-1`` if
				350	this is not possible. This macro does not raise exceptions.
				351
				352
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	353	.. c:function:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	354
				355	Return the character ch converted to a double. Return ``-1.0`` if this is not
				356	possible. This macro does not raise exceptions.
				357
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	358
Ezio Melotti	8c9375b	2011-08-22 20:03:25 +0300	[diff] [blame]	359	These APIs can be used to work with surrogates:
				360
				361	.. c:macro:: Py_UNICODE_IS_SURROGATE(ch)
				362
				363	Check if ch is a surrogate (``0xD800 <= ch <= 0xDFFF``).
				364
				365	.. c:macro:: Py_UNICODE_IS_HIGH_SURROGATE(ch)
				366
				367	Check if ch is an high surrogate (``0xD800 <= ch <= 0xDBFF``).
				368
				369	.. c:macro:: Py_UNICODE_IS_LOW_SURROGATE(ch)
				370
				371	Check if ch is a low surrogate (``0xDC00 <= ch <= 0xDFFF``).
				372
				373	.. c:macro:: Py_UNICODE_JOIN_SURROGATES(high, low)
				374
				375	Join two surrogate characters and return a single Py_UCS4 value.
				376	high and low are respectively the leading and trailing surrogates in a
				377	surrogate pair.
				378
				379
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	380	Creating and accessing Unicode strings
				381	""""""""""""""""""""""""""""""""""""""
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	382
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	383	To create Unicode objects and access their basic sequence properties, use these
				384	APIs:
				385
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	386	.. c:function:: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	387
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	388	Create a new Unicode object. maxchar should be the true maximum code point
				389	to be placed in the string. As an approximation, it can be rounded up to the
				390	nearest value in the sequence 127, 255, 65535, 1114111.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	391
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	392	This is the recommended way to allocate a new Unicode object. Objects
				393	created using this function are not resizable.
				394
				395	.. versionadded:: 3.3
				396
				397
				398	.. c:function:: PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, \
				399	Py_ssize_t size)
				400
				401	Create a new Unicode object with the given kind (possible values are
				402	:c:macro:`PyUnicode_1BYTE_KIND` etc., as returned by
				403	:c:func:`PyUnicode_KIND`). The buffer must point to an array of size
				404	units of 1, 2 or 4 bytes per character, as given by the kind.
				405
				406	.. versionadded:: 3.3
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	407
				408
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	409	.. c:function:: PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	410
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	411	Create a Unicode object from the char buffer u. The bytes will be
				412	interpreted as being UTF-8 encoded. The buffer is copied into the new
				413	object. If the buffer is not NULL, the return value might be a shared
				414	object, i.e. modification of the data is not allowed.
				415
				416	If u is NULL, this function behaves like :c:func:`PyUnicode_FromUnicode`
				417	with the buffer set to NULL. This usage is deprecated in favor of
				418	:c:func:`PyUnicode_New`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	419
				420
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	421	.. c:function:: PyObject PyUnicode_FromString(const char u)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	422
				423	Create a Unicode object from an UTF-8 encoded null-terminated char buffer
				424	u.
				425
				426
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	427	.. c:function:: PyObject* PyUnicode_FromFormat(const char *format, ...)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	428
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	429	Take a C :c:func:`printf`\ -style format string and a variable number of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	430	arguments, calculate the size of the resulting Python unicode string and return
				431	a string with the values formatted into it. The variable arguments must be C
				432	types and must correspond exactly to the format characters in the format
Victor Stinner	1205f27	2010-09-11 00:54:47 +0000	[diff] [blame]	433	ASCII-encoded string. The following format characters are allowed:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	434
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	435	.. % This should be exactly the same as the table in PyErr_Format.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	436	.. % The descriptions for %zd and %zu are wrong, but the truth is complicated
				437	.. % because not all compilers support the %z width modifier -- we fake it
				438	.. % when necessary via interpolating PY_FORMAT_SIZE_T.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	439	.. % Similar comments apply to the %ll width modifier and
				440	.. % PY_FORMAT_LONG_LONG.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	441
Georg Brandl	44ea77b	2013-03-28 13:28:44 +0100	[diff] [blame]	442	.. tabularcolumns:: \|l\|l\|L\|
				443
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	444	+-------------------+---------------------+--------------------------------+
				445	\| Format Characters \| Type \| Comment \|
				446	+===================+=====================+================================+
				447	\| :attr:`%%` \| n/a \| The literal % character. \|
				448	+-------------------+---------------------+--------------------------------+
				449	\| :attr:`%c` \| int \| A single character, \|
				450	\| \| \| represented as an C int. \|
				451	+-------------------+---------------------+--------------------------------+
				452	\| :attr:`%d` \| int \| Exactly equivalent to \|
				453	\| \| \| ``printf("%d")``. \|
				454	+-------------------+---------------------+--------------------------------+
				455	\| :attr:`%u` \| unsigned int \| Exactly equivalent to \|
				456	\| \| \| ``printf("%u")``. \|
				457	+-------------------+---------------------+--------------------------------+
				458	\| :attr:`%ld` \| long \| Exactly equivalent to \|
				459	\| \| \| ``printf("%ld")``. \|
				460	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	461	\| :attr:`%li` \| long \| Exactly equivalent to \|
				462	\| \| \| ``printf("%li")``. \|
				463	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	464	\| :attr:`%lu` \| unsigned long \| Exactly equivalent to \|
				465	\| \| \| ``printf("%lu")``. \|
				466	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	467	\| :attr:`%lld` \| long long \| Exactly equivalent to \|
				468	\| \| \| ``printf("%lld")``. \|
				469	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	470	\| :attr:`%lli` \| long long \| Exactly equivalent to \|
				471	\| \| \| ``printf("%lli")``. \|
				472	+-------------------+---------------------+--------------------------------+
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	473	\| :attr:`%llu` \| unsigned long long \| Exactly equivalent to \|
				474	\| \| \| ``printf("%llu")``. \|
				475	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	476	\| :attr:`%zd` \| Py_ssize_t \| Exactly equivalent to \|
				477	\| \| \| ``printf("%zd")``. \|
				478	+-------------------+---------------------+--------------------------------+
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	479	\| :attr:`%zi` \| Py_ssize_t \| Exactly equivalent to \|
				480	\| \| \| ``printf("%zi")``. \|
				481	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	482	\| :attr:`%zu` \| size_t \| Exactly equivalent to \|
				483	\| \| \| ``printf("%zu")``. \|
				484	+-------------------+---------------------+--------------------------------+
				485	\| :attr:`%i` \| int \| Exactly equivalent to \|
				486	\| \| \| ``printf("%i")``. \|
				487	+-------------------+---------------------+--------------------------------+
				488	\| :attr:`%x` \| int \| Exactly equivalent to \|
				489	\| \| \| ``printf("%x")``. \|
				490	+-------------------+---------------------+--------------------------------+
				491	\| :attr:`%s` \| char\* \| A null-terminated C character \|
				492	\| \| \| array. \|
				493	+-------------------+---------------------+--------------------------------+
				494	\| :attr:`%p` \| void\* \| The hex representation of a C \|
				495	\| \| \| pointer. Mostly equivalent to \|
				496	\| \| \| ``printf("%p")`` except that \|
				497	\| \| \| it is guaranteed to start with \|
				498	\| \| \| the literal ``0x`` regardless \|
				499	\| \| \| of what the platform's \|
				500	\| \| \| ``printf`` yields. \|
				501	+-------------------+---------------------+--------------------------------+
Georg Brandl	559e5d7	2008-06-11 18:37:52 +0000	[diff] [blame]	502	\| :attr:`%A` \| PyObject\* \| The result of calling \|
				503	\| \| \| :func:`ascii`. \|
				504	+-------------------+---------------------+--------------------------------+
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	505	\| :attr:`%U` \| PyObject\* \| A unicode object. \|
				506	+-------------------+---------------------+--------------------------------+
				507	\| :attr:`%V` \| PyObject\, char \ \| A unicode object (which may be \|
				508	\| \| \| NULL) and a null-terminated \|
				509	\| \| \| C character array as a second \|
				510	\| \| \| parameter (which will be used, \|
				511	\| \| \| if the first parameter is \|
				512	\| \| \| NULL). \|
				513	+-------------------+---------------------+--------------------------------+
				514	\| :attr:`%S` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	515	\| \| \| :c:func:`PyObject_Str`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	516	+-------------------+---------------------+--------------------------------+
				517	\| :attr:`%R` \| PyObject\* \| The result of calling \|
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	518	\| \| \| :c:func:`PyObject_Repr`. \|
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	519	+-------------------+---------------------+--------------------------------+
				520
				521	An unrecognized format character causes all the rest of the format string to be
				522	copied as-is to the result string, and any extra arguments discarded.
				523
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	524	.. note::
				525
				526	The `"%lld"` and `"%llu"` format specifiers are only available
Georg Brandl	ef871f6	2010-03-12 10:06:40 +0000	[diff] [blame]	527	when :const:`HAVE_LONG_LONG` is defined.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	528
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	529	.. note::
				530	The width formatter unit is number of characters rather than bytes.
				531	The precision formatter unit is number of bytes for ``"%s"`` and
				532	``"%V"`` (if the ``PyObject*`` argument is NULL), and a number of
				533	characters for ``"%A"``, ``"%U"``, ``"%S"``, ``"%R"`` and ``"%V"``
				534	(if the ``PyObject*`` argument is not NULL).
				535
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	536	.. versionchanged:: 3.2
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	537	Support for ``"%lld"`` and ``"%llu"`` added.
Mark Dickinson	6ce4a9a	2009-11-16 17:00:11 +0000	[diff] [blame]	538
Victor Stinner	0fbe226	2011-03-02 00:10:34 +0000	[diff] [blame]	539	.. versionchanged:: 3.3
				540	Support for ``"%li"``, ``"%lli"`` and ``"%zi"`` added.
				541
Victor Stinner	8cecc8c	2013-05-06 23:11:54 +0200	[diff] [blame]	542	.. versionchanged:: 3.4
				543	Support width and precision formatter for ``"%s"``, ``"%A"``, ``"%U"``,
				544	``"%V"``, ``"%S"``, ``"%R"`` added.
				545
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	546
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	547	.. c:function:: PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	548
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	549	Identical to :c:func:`PyUnicode_FromFormat` except that it takes exactly two
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	550	arguments.
				551
Alexander Belopolsky	942af5a	2010-12-04 03:38:46 +0000	[diff] [blame]	552
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	553	.. c:function:: PyObject* PyUnicode_FromEncodedObject(PyObject *obj, \
				554	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	555
				556	Coerce an encoded object obj to an Unicode object and return a reference with
				557	incremented refcount.
				558
Georg Brandl	952867a	2010-06-27 10:17:12 +0000	[diff] [blame]	559	:class:`bytes`, :class:`bytearray` and other char buffer compatible objects
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	560	are decoded according to the given encoding and using the error handling
				561	defined by errors. Both can be NULL to have the interface use the default
Georg Brandl	952867a	2010-06-27 10:17:12 +0000	[diff] [blame]	562	values (see the next section for details).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	563
				564	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				565	set.
				566
				567	The API returns NULL if there was an error. The caller is responsible for
				568	decref'ing the returned objects.
				569
				570
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	571	.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
				572
				573	Return the length of the Unicode object, in code points.
				574
				575	.. versionadded:: 3.3
				576
				577
				578	.. c:function:: int PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, \
Serhiy Storchaka	cdd0279	2013-08-08 16:47:43 +0300	[diff] [blame]	579	PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	580
				581	Copy characters from one Unicode object into another. This function performs
				582	character conversion when necessary and falls back to :c:func:`memcpy` if
				583	possible. Returns ``-1`` and sets an exception on error, otherwise returns
				584	``0``.
				585
				586	.. versionadded:: 3.3
				587
				588
Victor Stinner	606e19d	2012-01-04 03:59:16 +0100	[diff] [blame]	589	.. c:function:: Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, \
Victor Stinner	3fe5531	2012-01-04 00:33:50 +0100	[diff] [blame]	590	Py_ssize_t length, Py_UCS4 fill_char)
				591
				592	Fill a string with a character: write fill_char into
				593	``unicode[start:start+length]``.
				594
				595	Fail if fill_char is bigger than the string maximum character, or if the
				596	string has more than 1 reference.
				597
				598	Return the number of written character, or return ``-1`` and raise an
				599	exception on error.
				600
				601	.. versionadded:: 3.3
				602
				603
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	604	.. c:function:: int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, \
				605	Py_UCS4 character)
				606
				607	Write a character to a string. The string must have been created through
				608	:c:func:`PyUnicode_New`. Since Unicode strings are supposed to be immutable,
				609	the string must not be shared, or have been hashed yet.
				610
				611	This function checks that unicode is a Unicode object, that the index is
				612	not out of bounds, and that the object can be modified safely (i.e. that it
				613	its reference count is one), in contrast to the macro version
				614	:c:func:`PyUnicode_WRITE_CHAR`.
				615
				616	.. versionadded:: 3.3
				617
				618
				619	.. c:function:: Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index)
				620
				621	Read a character from a string. This function checks that unicode is a
				622	Unicode object and the index is not out of bounds, in contrast to the macro
				623	version :c:func:`PyUnicode_READ_CHAR`.
				624
				625	.. versionadded:: 3.3
				626
				627
				628	.. c:function:: PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, \
				629	Py_ssize_t end)
				630
				631	Return a substring of str, from character index start (included) to
				632	character index end (excluded). Negative indices are not supported.
				633
				634	.. versionadded:: 3.3
				635
				636
				637	.. c:function:: Py_UCS4* PyUnicode_AsUCS4(PyObject u, Py_UCS4 buffer, \
				638	Py_ssize_t buflen, int copy_null)
				639
				640	Copy the string u into a UCS4 buffer, including a null character, if
				641	copy_null is set. Returns NULL and sets an exception on error (in
				642	particular, a :exc:`ValueError` if buflen is smaller than the length of
				643	u). buffer is returned on success.
				644
				645	.. versionadded:: 3.3
				646
				647
				648	.. c:function:: Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u)
				649
				650	Copy the string u into a new UCS4 buffer that is allocated using
				651	:c:func:`PyMem_Malloc`. If this fails, NULL is returned with a
				652	:exc:`MemoryError` set.
				653
				654	.. versionadded:: 3.3
				655
				656
				657	Deprecated Py_UNICODE APIs
				658	""""""""""""""""""""""""""
				659
				660	.. deprecated-removed:: 3.3 4.0
				661
				662	These API functions are deprecated with the implementation of :pep:`393`.
				663	Extension modules can continue using them, as they will not be removed in Python
				664	3.x, but need to be aware that their use can now cause performance and memory hits.
				665
				666
				667	.. c:function:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				668
				669	Create a Unicode object from the Py_UNICODE buffer u of the given size. u
				670	may be NULL which causes the contents to be undefined. It is the user's
				671	responsibility to fill in the needed data. The buffer is copied into the new
				672	object.
				673
				674	If the buffer is not NULL, the return value might be a shared object.
				675	Therefore, modification of the resulting Unicode object is only allowed when
				676	u is NULL.
				677
				678	If the buffer is NULL, :c:func:`PyUnicode_READY` must be called once the
				679	string content has been filled before using any of the access macros such as
				680	:c:func:`PyUnicode_KIND`.
				681
				682	Please migrate to using :c:func:`PyUnicode_FromKindAndData` or
				683	:c:func:`PyUnicode_New`.
				684
				685
				686	.. c:function:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				687
				688	Return a read-only pointer to the Unicode object's internal
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	689	:c:type:`Py_UNICODE` buffer, or NULL on error. This will create the
				690	:c:type:`Py_UNICODE*` representation of the object if it is not yet
				691	available. Note that the resulting :c:type:`Py_UNICODE` string may contain
				692	embedded null characters, which would cause the string to be truncated when
				693	used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	694
				695	Please migrate to using :c:func:`PyUnicode_AsUCS4`,
				696	:c:func:`PyUnicode_Substring`, :c:func:`PyUnicode_ReadChar` or similar new
				697	APIs.
				698
				699
				700	.. c:function:: PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size)
				701
				702	Create a Unicode object by replacing all decimal digits in
				703	:c:type:`Py_UNICODE` buffer of the given size by ASCII digits 0--9
				704	according to their decimal value. Return NULL if an exception occurs.
				705
				706
				707	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject unicode, Py_ssize_t size)
				708
				709	Like :c:func:`PyUnicode_AsUnicode`, but also saves the :c:func:`Py_UNICODE`
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	710	array length in size. Note that the resulting :c:type:`Py_UNICODE*` string
				711	may contain embedded null characters, which would cause the string to be
				712	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	713
				714	.. versionadded:: 3.3
				715
				716
				717	.. c:function:: Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode)
				718
				719	Create a copy of a Unicode string ending with a nul character. Return NULL
				720	and raise a :exc:`MemoryError` exception on memory allocation failure,
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	721	otherwise return a new allocated buffer (use :c:func:`PyMem_Free` to free
				722	the buffer). Note that the resulting :c:type:`Py_UNICODE*` string may
				723	contain embedded null characters, which would cause the string to be
				724	truncated when used in most C functions.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	725
				726	.. versionadded:: 3.2
				727
				728	Please migrate to using :c:func:`PyUnicode_AsUCS4Copy` or similar new APIs.
				729
				730
				731	.. c:function:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				732
				733	Return the size of the deprecated :c:type:`Py_UNICODE` representation, in
				734	code units (this includes surrogate pairs as 2 units).
				735
				736	Please migrate to using :c:func:`PyUnicode_GetLength`.
				737
				738
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	739	.. c:function:: PyObject* PyUnicode_FromObject(PyObject *obj)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	740
				741	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				742	throughout the interpreter whenever coercion to Unicode is needed.
				743
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	744
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	745	Locale Encoding
				746	"""""""""""""""
				747
				748	The current locale encoding can be used to decode text from the operating
				749	system.
				750
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	751	.. c:function:: PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, \
				752	Py_ssize_t len, \
				753	const char *errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	754
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	755	Decode a string from the current locale encoding. The supported
				756	error handlers are ``"strict"`` and ``"surrogateescape"``
				757	(:pep:`383`). The decoder uses ``"strict"`` error handler if
Andrew Svetlov	f4c3a18	2012-11-29 15:23:15 +0200	[diff] [blame]	758	errors is ``NULL``. str must end with a null character but
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	759	cannot contain embedded null characters.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	760
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	761	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` to decode a string from
				762	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				763	Python startup).
				764
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	765	.. seealso::
				766
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	767	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	768
				769	.. versionadded:: 3.3
				770
				771
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	772	.. c:function:: PyObject* PyUnicode_DecodeLocale(const char str, const char errors)
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	773
				774	Similar to :c:func:`PyUnicode_DecodeLocaleAndSize`, but compute the string
				775	length using :c:func:`strlen`.
				776
				777	.. versionadded:: 3.3
				778
				779
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	780	.. c:function:: PyObject* PyUnicode_EncodeLocale(PyObject unicode, const char errors)
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	781
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	782	Encode a Unicode object to the current locale encoding. The
				783	supported error handlers are ``"strict"`` and ``"surrogateescape"``
				784	(:pep:`383`). The encoder uses ``"strict"`` error handler if
				785	errors is ``NULL``. Return a :class:`bytes` object. str cannot
				786	contain embedded null characters.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	787
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	788	Use :c:func:`PyUnicode_EncodeFSDefault` to encode a string to
				789	:c:data:`Py_FileSystemDefaultEncoding` (the locale encoding read at
				790	Python startup).
				791
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	792	.. seealso::
				793
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	794	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	795
				796	.. versionadded:: 3.3
				797
				798
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	799	File System Encoding
				800	""""""""""""""""""""
				801
				802	To encode and decode file names and other environment strings,
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	803	:c:data:`Py_FileSystemEncoding` should be used as the encoding, and
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	804	``"surrogateescape"`` should be used as the error handler (:pep:`383`). To
				805	encode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	806	used, passing :c:func:`PyUnicode_FSConverter` as the conversion function:
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	807
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	808	.. c:function:: int PyUnicode_FSConverter(PyObject* obj, void* result)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	809
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	810	ParseTuple converter: encode :class:`str` objects to :class:`bytes` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	811	:c:func:`PyUnicode_EncodeFSDefault`; :class:`bytes` objects are output as-is.
				812	result must be a :c:type:`PyBytesObject*` which must be released when it is
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	813	no longer used.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	814
				815	.. versionadded:: 3.1
				816
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	817
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	818	To decode file names during argument parsing, the ``"O&"`` converter should be
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	819	used, passing :c:func:`PyUnicode_FSDecoder` as the conversion function:
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	820
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	821	.. c:function:: int PyUnicode_FSDecoder(PyObject* obj, void* result)
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	822
				823	ParseTuple converter: decode :class:`bytes` objects to :class:`str` using
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	824	:c:func:`PyUnicode_DecodeFSDefaultAndSize`; :class:`str` objects are output
				825	as-is. result must be a :c:type:`PyUnicodeObject*` which must be released
Victor Stinner	47fcb5b	2010-08-13 23:59:58 +0000	[diff] [blame]	826	when it is no longer used.
				827
				828	.. versionadded:: 3.2
				829
Georg Brandl	67b21b7	2010-08-17 15:07:14 +0000	[diff] [blame]	830
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	831	.. c:function:: PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size)
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	832
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	833	Decode a string using :c:data:`Py_FileSystemDefaultEncoding` and the
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	834	``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	835
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	836	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				837	locale encoding.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	838
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	839	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				840	locale encoding and cannot be modified later. If you need to decode a string
				841	from the current locale encoding, use
				842	:c:func:`PyUnicode_DecodeLocaleAndSize`.
				843
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	844	.. seealso::
				845
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	846	The :c:func:`Py_DecodeLocale` function.
Victor Stinner	af02e1c	2011-12-16 23:56:01 +0100	[diff] [blame]	847
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	848	.. versionchanged:: 3.2
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	849	Use ``"strict"`` error handler on Windows.
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	850
				851
				852	.. c:function:: PyObject* PyUnicode_DecodeFSDefault(const char *s)
				853
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	854	Decode a null-terminated string using :c:data:`Py_FileSystemDefaultEncoding`
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	855	and the ``"surrogateescape"`` error handler, or ``"strict"`` on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	856
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	857	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				858	locale encoding.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	859
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	860	Use :c:func:`PyUnicode_DecodeFSDefaultAndSize` if you know the string length.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	861
Victor Stinner	62165d6	2010-10-09 10:34:37 +0000	[diff] [blame]	862	.. versionchanged:: 3.2
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	863	Use ``"strict"`` error handler on Windows.
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	864
				865
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	866	.. c:function:: PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode)
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	867
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	868	Encode a Unicode object to :c:data:`Py_FileSystemDefaultEncoding` with the
Andrew Svetlov	0fe030b	2012-11-28 12:33:58 +0200	[diff] [blame]	869	``"surrogateescape"`` error handler, or ``"strict"`` on Windows, and return
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	870	:class:`bytes`. Note that the resulting :class:`bytes` object may contain
				871	null bytes.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	872
Victor Stinner	f3170cc	2010-10-15 12:04:23 +0000	[diff] [blame]	873	If :c:data:`Py_FileSystemDefaultEncoding` is not set, fall back to the
				874	locale encoding.
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	875
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	876	:c:data:`Py_FileSystemDefaultEncoding` is initialized at startup from the
				877	locale encoding and cannot be modified later. If you need to encode a string
				878	to the current locale encoding, use :c:func:`PyUnicode_EncodeLocale`.
				879
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	880	.. seealso::
				881
Victor Stinner	f6a271a	2014-08-01 12:28:48 +0200	[diff] [blame]	882	The :c:func:`Py_EncodeLocale` function.
Victor Stinner	f2ea71f	2011-12-17 04:13:41 +0100	[diff] [blame]	883
Victor Stinner	ae6265f	2010-05-15 16:27:27 +0000	[diff] [blame]	884	.. versionadded:: 3.2
				885
				886
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	887	wchar_t Support
				888	"""""""""""""""
				889
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	890	:c:type:`wchar_t` support for platforms which support it:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	891
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	892	.. c:function:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	893
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	894	Create a Unicode object from the :c:type:`wchar_t` buffer w of the given size.
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	895	Passing -1 as the size indicates that the function must itself compute the length,
Martin v. Löwis	790465f	2008-04-05 20:41:37 +0000	[diff] [blame]	896	using wcslen.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	897	Return NULL on failure.
				898
				899
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	900	.. c:function:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	901
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	902	Copy the Unicode object contents into the :c:type:`wchar_t` buffer w. At most
				903	size :c:type:`wchar_t` characters are copied (excluding a possibly trailing
				904	0-termination character). Return the number of :c:type:`wchar_t` characters
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	905	copied or -1 in case of an error. Note that the resulting :c:type:`wchar_t*`
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	906	string may or may not be 0-terminated. It is the responsibility of the caller
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	907	to make sure that the :c:type:`wchar_t*` string is 0-terminated in case this is
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	908	required by the application. Also, note that the :c:type:`wchar_t*` string
				909	might contain null characters, which would cause the string to be truncated
				910	when used with most C functions.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	911
				912
Victor Stinner	beb4135b	2010-10-07 01:02:42 +0000	[diff] [blame]	913	.. c:function:: wchar_t* PyUnicode_AsWideCharString(PyObject unicode, Py_ssize_t size)
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	914
				915	Convert the Unicode object to a wide character string. The output string
				916	always ends with a nul character. If size is not NULL, write the number
Victor Stinner	1c24bd0	2010-10-02 11:03:13 +0000	[diff] [blame]	917	of wide characters (excluding the trailing 0-termination character) into
				918	\size*.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	919
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	920	Returns a buffer allocated by :c:func:`PyMem_Alloc` (use
				921	:c:func:`PyMem_Free` to free it) on success. On error, returns NULL,
				922	\size* is undefined and raises a :exc:`MemoryError`. Note that the
Victor Stinner	0d81c13	2011-12-18 19:30:55 +0100	[diff] [blame]	923	resulting :c:type:`wchar_t` string might contain null characters, which
Victor Stinner	6fbd525	2011-12-18 19:22:31 +0100	[diff] [blame]	924	would cause the string to be truncated when used with most C functions.
Victor Stinner	137c34c	2010-09-29 10:25:54 +0000	[diff] [blame]	925
				926	.. versionadded:: 3.2
				927
				928
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	929	UCS4 Support
				930	""""""""""""
				931
				932	.. versionadded:: 3.3
				933
				934	.. XXX are these meant to be public?
				935
				936	.. c:function:: size_t Py_UCS4_strlen(const Py_UCS4 *u)
				937	Py_UCS4* Py_UCS4_strcpy(Py_UCS4 s1, const Py_UCS4 s2)
				938	Py_UCS4* Py_UCS4_strncpy(Py_UCS4 s1, const Py_UCS4 s2, size_t n)
				939	Py_UCS4* Py_UCS4_strcat(Py_UCS4 s1, const Py_UCS4 s2)
				940	int Py_UCS4_strcmp(const Py_UCS4 s1, const Py_UCS4 s2)
				941	int Py_UCS4_strncmp(const Py_UCS4 s1, const Py_UCS4 s2, size_t n)
Antoine Pitrou	57735a0	2011-10-22 22:08:46 +0200	[diff] [blame]	942	Py_UCS4* Py_UCS4_strchr(const Py_UCS4 *s, Py_UCS4 c)
				943	Py_UCS4* Py_UCS4_strrchr(const Py_UCS4 *s, Py_UCS4 c)
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	944
				945	These utility functions work on strings of :c:type:`Py_UCS4` characters and
				946	otherwise behave like the C standard library functions with the same name.
				947
				948
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	949	.. _builtincodecs:
				950
				951	Built-in Codecs
				952	^^^^^^^^^^^^^^^
				953
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	954	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	955	these codecs are directly usable via the following functions.
				956
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	957	Many of the following APIs take two arguments encoding and errors, and they
				958	have the same semantics as the ones of the built-in :func:`str` string object
				959	constructor.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	960
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	961	Setting encoding to NULL causes the default encoding to be used
				962	which is ASCII. The file system calls should use
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	963	:c:func:`PyUnicode_FSConverter` for encoding file names. This uses the
				964	variable :c:data:`Py_FileSystemDefaultEncoding` internally. This
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	965	variable should be treated as read-only: on some systems, it will be a
Martin v. Löwis	c15bdef	2009-05-29 14:47:46 +0000	[diff] [blame]	966	pointer to a static string, on others, it will change at run-time
				967	(such as when the application invokes setlocale).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	968
				969	Error handling is set by errors which may also be set to NULL meaning to use
				970	the default handling defined for the codec. Default error handling for all
Georg Brandl	22b3431	2009-07-26 14:54:51 +0000	[diff] [blame]	971	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	972
				973	The codecs all use a similar interface. Only deviation from the following
				974	generic ones are documented for simplicity.
				975
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	976
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	977	Generic Codecs
				978	""""""""""""""
				979
				980	These are the generic codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	981
				982
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	983	.. c:function:: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, \
				984	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	985
				986	Create a Unicode object by decoding size bytes of the encoded string s.
				987	encoding and errors have the same meaning as the parameters of the same name
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	988	in the :func:`str` built-in function. The codec to be used is looked up
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	989	using the Python codec registry. Return NULL if an exception was raised by
				990	the codec.
				991
				992
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	993	.. c:function:: PyObject* PyUnicode_AsEncodedString(PyObject *unicode, \
				994	const char encoding, const char errors)
				995
				996	Encode a Unicode object and return the result as Python bytes object.
				997	encoding and errors have the same meaning as the parameters of the same
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	998	name in the Unicode :meth:`~str.encode` method. The codec to be used is looked up
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	999	using the Python codec registry. Return NULL if an exception was raised by
				1000	the codec.
				1001
				1002
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1003	.. c:function:: PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, \
				1004	const char encoding, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1005
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1006	Encode the :c:type:`Py_UNICODE` buffer s of the given size and return a Python
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1007	bytes object. encoding and errors have the same meaning as the
Serhiy Storchaka	0b68a2d	2013-10-09 13:26:17 +0300	[diff] [blame]	1008	parameters of the same name in the Unicode :meth:`~str.encode` method. The codec
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1009	to be used is looked up using the Python codec registry. Return NULL if an
				1010	exception was raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1011
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1012	.. deprecated-removed:: 3.3 4.0
				1013	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1014	:c:func:`PyUnicode_AsEncodedString`.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1015
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1016
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1017	UTF-8 Codecs
				1018	""""""""""""
				1019
				1020	These are the UTF-8 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1021
				1022
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1023	.. c:function:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1024
				1025	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				1026	s. Return NULL if an exception was raised by the codec.
				1027
				1028
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1029	.. c:function:: PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, \
				1030	const char errors, Py_ssize_t consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1031
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1032	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF8`. If
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1033	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				1034	treated as an error. Those bytes will not be decoded and the number of bytes
				1035	that have been decoded will be stored in consumed.
				1036
				1037
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1038	.. c:function:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1039
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1040	Encode a Unicode object using UTF-8 and return the result as Python bytes
				1041	object. Error handling is "strict". Return NULL if an exception was
				1042	raised by the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1043
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1044
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1045	.. c:function:: char* PyUnicode_AsUTF8AndSize(PyObject unicode, Py_ssize_t size)
				1046
				1047	Return a pointer to the default encoding (UTF-8) of the Unicode object, and
				1048	store the size of the encoded representation (in bytes) in size. size
				1049	can be NULL, in this case no size will be stored.
				1050
				1051	In the case of an error, NULL is returned with an exception set and no
				1052	size is stored.
				1053
				1054	This caches the UTF-8 representation of the string in the Unicode object, and
				1055	subsequent calls will return a pointer to the same buffer. The caller is not
				1056	responsible for deallocating the buffer.
				1057
				1058	.. versionadded:: 3.3
				1059
				1060
				1061	.. c:function:: char* PyUnicode_AsUTF8(PyObject *unicode)
				1062
				1063	As :c:func:`PyUnicode_AsUTF8AndSize`, but does not store the size.
				1064
				1065	.. versionadded:: 3.3
				1066
				1067
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1068	.. c:function:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				1069
				1070	Encode the :c:type:`Py_UNICODE` buffer s of the given size using UTF-8 and
				1071	return a Python bytes object. Return NULL if an exception was raised by
				1072	the codec.
				1073
				1074	.. deprecated-removed:: 3.3 4.0
				1075	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1076	:c:func:`PyUnicode_AsUTF8String` or :c:func:`PyUnicode_AsUTF8AndSize`.
				1077
				1078
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1079	UTF-32 Codecs
				1080	"""""""""""""
				1081
				1082	These are the UTF-32 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1083
				1084
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1085	.. c:function:: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, \
				1086	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1087
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1088	Decode size bytes from a UTF-32 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1089	corresponding Unicode object. errors (if non-NULL) defines the error
				1090	handling. It defaults to "strict".
				1091
				1092	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1093	order::
				1094
				1095	*byteorder == -1: little endian
				1096	*byteorder == 0: native order
				1097	*byteorder == 1: big endian
				1098
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1099	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				1100	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1101	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1102	``1``, any byte order mark is copied to the output.
				1103
				1104	After completion, \byteorder* is set to the current byte order at the end
				1105	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1106
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1107	If byteorder is NULL, the codec starts in native order mode.
				1108
				1109	Return NULL if an exception was raised by the codec.
				1110
				1111
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1112	.. c:function:: PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, \
				1113	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1114
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1115	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF32`. If
				1116	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF32Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1117	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				1118	by four) as an error. Those bytes will not be decoded and the number of bytes
				1119	that have been decoded will be stored in consumed.
				1120
				1121
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1122	.. c:function:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				1123
				1124	Return a Python byte string using the UTF-32 encoding in native byte
				1125	order. The string always starts with a BOM mark. Error handling is "strict".
				1126	Return NULL if an exception was raised by the codec.
				1127
				1128
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1129	.. c:function:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, \
				1130	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1131
				1132	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1133	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1134
				1135	byteorder == -1: little endian
				1136	byteorder == 0: native byte order (writes a BOM mark)
				1137	byteorder == 1: big endian
				1138
				1139	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1140	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1141
				1142	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				1143	as a single codepoint.
				1144
				1145	Return NULL if an exception was raised by the codec.
				1146
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1147	.. deprecated-removed:: 3.3 4.0
				1148	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1149	:c:func:`PyUnicode_AsUTF32String`.
				1150
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1151
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1152	UTF-16 Codecs
				1153	"""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1154
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1155	These are the UTF-16 codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1156
				1157
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1158	.. c:function:: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, \
				1159	const char errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1160
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1161	Decode size bytes from a UTF-16 encoded buffer string and return the
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1162	corresponding Unicode object. errors (if non-NULL) defines the error
				1163	handling. It defaults to "strict".
				1164
				1165	If byteorder is non-NULL, the decoder starts decoding using the given byte
				1166	order::
				1167
				1168	*byteorder == -1: little endian
				1169	*byteorder == 0: native order
				1170	*byteorder == 1: big endian
				1171
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1172	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				1173	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				1174	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				1175	``1``, any byte order mark is copied to the output (where it will result in
				1176	either a ``\ufeff`` or a ``\ufffe`` character).
				1177
				1178	After completion, \byteorder* is set to the current byte order at the end
				1179	of input data.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1180
				1181	If byteorder is NULL, the codec starts in native order mode.
				1182
				1183	Return NULL if an exception was raised by the codec.
				1184
				1185
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1186	.. c:function:: PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, \
				1187	const char errors, int byteorder, Py_ssize_t *consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1188
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1189	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF16`. If
				1190	consumed is not NULL, :c:func:`PyUnicode_DecodeUTF16Stateful` will not treat
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1191	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				1192	split surrogate pair) as an error. Those bytes will not be decoded and the
				1193	number of bytes that have been decoded will be stored in consumed.
				1194
				1195
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1196	.. c:function:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				1197
				1198	Return a Python byte string using the UTF-16 encoding in native byte
				1199	order. The string always starts with a BOM mark. Error handling is "strict".
				1200	Return NULL if an exception was raised by the codec.
				1201
				1202
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1203	.. c:function:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, \
				1204	const char *errors, int byteorder)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1205
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1206	Return a Python bytes object holding the UTF-16 encoded value of the Unicode
Benjamin Peterson	4ac9ce4	2009-10-04 14:49:41 +0000	[diff] [blame]	1207	data in s. Output is written according to the following byte order::
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1208
				1209	byteorder == -1: little endian
				1210	byteorder == 0: native byte order (writes a BOM mark)
				1211	byteorder == 1: big endian
				1212
				1213	If byteorder is ``0``, the output string will always start with the Unicode BOM
				1214	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				1215
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1216	If Py_UNICODE_WIDE is defined, a single :c:type:`Py_UNICODE` value may get
				1217	represented as a surrogate pair. If it is not defined, each :c:type:`Py_UNICODE`
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1218	values is interpreted as an UCS-2 character.
				1219
				1220	Return NULL if an exception was raised by the codec.
				1221
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1222	.. deprecated-removed:: 3.3 4.0
				1223	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1224	:c:func:`PyUnicode_AsUTF16String`.
				1225
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1226
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1227	UTF-7 Codecs
				1228	""""""""""""
				1229
				1230	These are the UTF-7 codec APIs:
				1231
				1232
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1233	.. c:function:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1234
				1235	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				1236	s. Return NULL if an exception was raised by the codec.
				1237
				1238
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1239	.. c:function:: PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, \
				1240	const char errors, Py_ssize_t consumed)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1241
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1242	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeUTF7`. If
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1243	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				1244	be treated as an error. Those bytes will not be decoded and the number of
				1245	bytes that have been decoded will be stored in consumed.
				1246
				1247
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1248	.. c:function:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, \
				1249	int base64SetO, int base64WhiteSpace, const char *errors)
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1250
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1251	Encode the :c:type:`Py_UNICODE` buffer of the given size using UTF-7 and
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1252	return a Python bytes object. Return NULL if an exception was raised by
				1253	the codec.
				1254
				1255	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				1256	special meaning) will be encoded in base-64. If base64WhiteSpace is
				1257	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				1258	Python "utf-7" codec.
				1259
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1260	.. deprecated-removed:: 3.3 4.0
				1261	Part of the old-style :c:type:`Py_UNICODE` API.
				1262
				1263	.. XXX replace with what?
				1264
Georg Brandl	8477f82	2010-08-02 20:05:19 +0000	[diff] [blame]	1265
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1266	Unicode-Escape Codecs
				1267	"""""""""""""""""""""
				1268
				1269	These are the "Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1270
				1271
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1272	.. c:function:: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, \
				1273	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1274
				1275	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				1276	string s. Return NULL if an exception was raised by the codec.
				1277
				1278
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1279	.. c:function:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				1280
				1281	Encode a Unicode object using Unicode-Escape and return the result as Python
				1282	string object. Error handling is "strict". Return NULL if an exception was
				1283	raised by the codec.
				1284
				1285
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1286	.. c:function:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1287
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1288	Encode the :c:type:`Py_UNICODE` buffer of the given size using Unicode-Escape and
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1289	return a Python string object. Return NULL if an exception was raised by the
				1290	codec.
				1291
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1292	.. deprecated-removed:: 3.3 4.0
				1293	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1294	:c:func:`PyUnicode_AsUnicodeEscapeString`.
				1295
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1296
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1297	Raw-Unicode-Escape Codecs
				1298	"""""""""""""""""""""""""
				1299
				1300	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1301
				1302
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1303	.. c:function:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, \
				1304	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1305
				1306	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				1307	encoded string s. Return NULL if an exception was raised by the codec.
				1308
				1309
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1310	.. c:function:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				1311
				1312	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				1313	Python string object. Error handling is "strict". Return NULL if an exception
				1314	was raised by the codec.
				1315
				1316
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1317	.. c:function:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, \
				1318	Py_ssize_t size, const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1319
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1320	Encode the :c:type:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1321	and return a Python string object. Return NULL if an exception was raised by
				1322	the codec.
				1323
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1324	.. deprecated-removed:: 3.3 4.0
				1325	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1326	:c:func:`PyUnicode_AsRawUnicodeEscapeString`.
				1327
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1328
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1329	Latin-1 Codecs
				1330	""""""""""""""
				1331
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1332	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				1333	ordinals and only these are accepted by the codecs during encoding.
				1334
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1335
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1336	.. c:function:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1337
				1338	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				1339	s. Return NULL if an exception was raised by the codec.
				1340
				1341
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1342	.. c:function:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				1343
				1344	Encode a Unicode object using Latin-1 and return the result as Python bytes
				1345	object. Error handling is "strict". Return NULL if an exception was
				1346	raised by the codec.
				1347
				1348
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1349	.. c:function:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1350
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1351	Encode the :c:type:`Py_UNICODE` buffer of the given size using Latin-1 and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1352	return a Python bytes object. Return NULL if an exception was raised by
				1353	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1354
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1355	.. deprecated-removed:: 3.3 4.0
				1356	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1357	:c:func:`PyUnicode_AsLatin1String`.
				1358
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1359
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1360	ASCII Codecs
				1361	""""""""""""
				1362
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1363	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				1364	codes generate errors.
				1365
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1366
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1367	.. c:function:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1368
				1369	Create a Unicode object by decoding size bytes of the ASCII encoded string
				1370	s. Return NULL if an exception was raised by the codec.
				1371
				1372
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1373	.. c:function:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				1374
				1375	Encode a Unicode object using ASCII and return the result as Python bytes
				1376	object. Error handling is "strict". Return NULL if an exception was
				1377	raised by the codec.
				1378
				1379
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1380	.. c:function:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1381
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1382	Encode the :c:type:`Py_UNICODE` buffer of the given size using ASCII and
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1383	return a Python bytes object. Return NULL if an exception was raised by
				1384	the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1385
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1386	.. deprecated-removed:: 3.3 4.0
				1387	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1388	:c:func:`PyUnicode_AsASCIIString`.
				1389
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1390
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1391	Character Map Codecs
				1392	""""""""""""""""""""
				1393
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1394	This codec is special in that it can be used to implement many different codecs
				1395	(and this is in fact what was done to obtain most of the standard codecs
				1396	included in the :mod:`encodings` package). The codec uses mapping to encode and
				1397	decode characters.
				1398
				1399	Decoding mappings must map single string characters to single Unicode
				1400	characters, integers (which are then interpreted as Unicode ordinals) or None
				1401	(meaning "undefined mapping" and causing an error).
				1402
				1403	Encoding mappings must map single Unicode characters to single string
				1404	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				1405	(meaning "undefined mapping" and causing an error).
				1406
				1407	The mapping objects provided must only support the __getitem__ mapping
				1408	interface.
				1409
				1410	If a character lookup fails with a LookupError, the character is copied as-is
				1411	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				1412	resp. Because of this, mappings only need to contain those mappings which map
				1413	characters to different code points.
				1414
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1415	These are the mapping codec APIs:
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1416
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1417	.. c:function:: PyObject* PyUnicode_DecodeCharmap(const char *s, Py_ssize_t size, \
				1418	PyObject mapping, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1419
				1420	Create a Unicode object by decoding size bytes of the encoded string s using
				1421	the given mapping object. Return NULL if an exception was raised by the
				1422	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				1423	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				1424	Byte values greater that the length of the string and U+FFFE "characters" are
				1425	treated as "undefined mapping".
				1426
				1427
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1428	.. c:function:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1429
				1430	Encode a Unicode object using the given mapping object and return the result
				1431	as Python string object. Error handling is "strict". Return NULL if an
				1432	exception was raised by the codec.
				1433
				1434	The following codec API is special in that maps Unicode to Unicode.
				1435
				1436
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1437	.. c:function:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1438	PyObject table, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1439
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1440	Translate a :c:type:`Py_UNICODE` buffer of the given size by applying a
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1441	character mapping table to it and return the resulting Unicode object. Return
				1442	NULL when an exception was raised by the codec.
				1443
				1444	The mapping table must map Unicode ordinal integers to Unicode ordinal
				1445	integers or None (causing deletion of the character).
				1446
				1447	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1448	and sequences work well. Unmapped character ordinals (ones which cause a
				1449	:exc:`LookupError`) are left untouched and are copied as-is.
				1450
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1451	.. deprecated-removed:: 3.3 4.0
				1452	Part of the old-style :c:type:`Py_UNICODE` API.
				1453
				1454	.. XXX replace with what?
Jeroen Ruigrok van der Werven	47a7d70	2009-04-27 05:43:17 +0000	[diff] [blame]	1455
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1456
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1457	.. c:function:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, \
				1458	PyObject mapping, const char errors)
				1459
				1460	Encode the :c:type:`Py_UNICODE` buffer of the given size using the given
				1461	mapping object and return a Python string object. Return NULL if an
				1462	exception was raised by the codec.
				1463
				1464	.. deprecated-removed:: 3.3 4.0
				1465	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
				1466	:c:func:`PyUnicode_AsCharmapString`.
				1467
				1468
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1469	MBCS codecs for Windows
				1470	"""""""""""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1471
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1472	These are the MBCS codec APIs. They are currently only available on Windows and
				1473	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				1474	DBCS) is a class of encodings, not just one. The target encoding is defined by
				1475	the user settings on the machine running the codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1476
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1477	.. c:function:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1478
				1479	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				1480	Return NULL if an exception was raised by the codec.
				1481
				1482
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1483	.. c:function:: PyObject* PyUnicode_DecodeMBCSStateful(const char *s, int size, \
				1484	const char errors, int consumed)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1485
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1486	If consumed is NULL, behave like :c:func:`PyUnicode_DecodeMBCS`. If
				1487	consumed is not NULL, :c:func:`PyUnicode_DecodeMBCSStateful` will not decode
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1488	trailing lead byte and the number of bytes that have been decoded will be stored
				1489	in consumed.
				1490
				1491
Antoine Pitrou	e6b99a1	2011-10-22 21:56:20 +0200	[diff] [blame]	1492	.. c:function:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				1493
				1494	Encode a Unicode object using MBCS and return the result as Python bytes
				1495	object. Error handling is "strict". Return NULL if an exception was
				1496	raised by the codec.
				1497
				1498
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1499	.. c:function:: PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject unicode, const char errors)
				1500
				1501	Encode the Unicode object using the specified code page and return a Python
				1502	bytes object. Return NULL if an exception was raised by the codec. Use
				1503	:c:data:`CP_ACP` code page to get the MBCS encoder.
				1504
				1505	.. versionadded:: 3.3
				1506
				1507
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1508	.. c:function:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1509
Ezio Melotti	c1f0577	2011-04-14 07:50:25 +0300	[diff] [blame]	1510	Encode the :c:type:`Py_UNICODE` buffer of the given size using MBCS and return
Benjamin Peterson	b6eba4f	2009-01-13 23:14:04 +0000	[diff] [blame]	1511	a Python bytes object. Return NULL if an exception was raised by the
				1512	codec.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1513
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1514	.. deprecated-removed:: 3.3 4.0
				1515	Part of the old-style :c:type:`Py_UNICODE` API; please migrate to using
Victor Stinner	b682101	2011-12-09 00:18:11 +0100	[diff] [blame]	1516	:c:func:`PyUnicode_AsMBCSString` or :c:func:`PyUnicode_EncodeCodePage`.
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1517
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1518
Victor Stinner	77c3862	2010-05-14 15:58:55 +0000	[diff] [blame]	1519	Methods & Slots
				1520	"""""""""""""""
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1521
				1522
				1523	.. _unicodemethodsandslots:
				1524
				1525	Methods and Slot Functions
				1526	^^^^^^^^^^^^^^^^^^^^^^^^^^
				1527
				1528	The following APIs are capable of handling Unicode objects and strings on input
				1529	(we refer to them as strings in the descriptions) and return Unicode objects or
				1530	integers as appropriate.
				1531
				1532	They all return NULL or ``-1`` if an exception occurs.
				1533
				1534
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1535	.. c:function:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1536
				1537	Concat two strings giving a new Unicode string.
				1538
				1539
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1540	.. c:function:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1541
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1542	Split a string giving a list of Unicode strings. If sep is NULL, splitting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1543	will be done at all whitespace substrings. Otherwise, splits occur at the given
				1544	separator. At most maxsplit splits will be done. If negative, no limit is
				1545	set. Separators are not included in the resulting list.
				1546
				1547
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1548	.. c:function:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1549
				1550	Split a Unicode string at line breaks, returning a list of Unicode strings.
				1551	CRLF is considered to be one line break. If keepend is 0, the Line break
				1552	characters are not included in the resulting strings.
				1553
				1554
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1555	.. c:function:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, \
				1556	const char *errors)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1557
				1558	Translate a string by applying a character mapping table to it and return the
				1559	resulting Unicode object.
				1560
				1561	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				1562	or None (causing deletion of the character).
				1563
				1564	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				1565	and sequences work well. Unmapped character ordinals (ones which cause a
				1566	:exc:`LookupError`) are left untouched and are copied as-is.
				1567
				1568	errors has the usual meaning for codecs. It may be NULL which indicates to
				1569	use the default error handling.
				1570
				1571
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1572	.. c:function:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1573
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1574	Join a sequence of strings using the given separator and return the resulting
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1575	Unicode string.
				1576
				1577
Victor Stinner	13d3aa5	2014-10-09 11:11:25 +0200	[diff] [blame]	1578	.. c:function:: Py_ssize_t PyUnicode_Tailmatch(PyObject str, PyObject substr, \
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1579	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1580
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1581	Return 1 if substr matches ``str[start:end]`` at the given tail end
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1582	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				1583	0 otherwise. Return ``-1`` if an error occurred.
				1584
				1585
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1586	.. c:function:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, \
				1587	Py_ssize_t start, Py_ssize_t end, int direction)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1588
Ezio Melotti	95cd91c	2011-04-14 07:43:53 +0300	[diff] [blame]	1589	Return the first position of substr in ``str[start:end]`` using the given
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1590	direction (direction == 1 means to do a forward search, direction == -1 a
				1591	backward search). The return value is the index of the first match; a value of
				1592	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				1593	occurred and an exception has been set.
				1594
				1595
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1596	.. c:function:: Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, \
				1597	Py_ssize_t start, Py_ssize_t end, int direction)
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1598
				1599	Return the first position of the character ch in ``str[start:end]`` using
				1600	the given direction (direction == 1 means to do a forward search,
				1601	direction == -1 a backward search). The return value is the index of the
				1602	first match; a value of ``-1`` indicates that no match was found, and ``-2``
				1603	indicates that an error occurred and an exception has been set.
				1604
Georg Brandl	ee12f44	2011-09-28 21:51:06 +0200	[diff] [blame]	1605	.. versionadded:: 3.3
				1606
Martin v. Löwis	d63a3b8	2011-09-28 07:41:54 +0200	[diff] [blame]	1607
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1608	.. c:function:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, \
				1609	Py_ssize_t start, Py_ssize_t end)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1610
				1611	Return the number of non-overlapping occurrences of substr in
				1612	``str[start:end]``. Return ``-1`` if an error occurred.
				1613
				1614
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1615	.. c:function:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, \
				1616	PyObject *replstr, Py_ssize_t maxcount)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1617
				1618	Replace at most maxcount occurrences of substr in str with replstr and
				1619	return the resulting Unicode object. maxcount == -1 means replace all
				1620	occurrences.
				1621
				1622
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1623	.. c:function:: int PyUnicode_Compare(PyObject left, PyObject right)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1624
				1625	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				1626	respectively.
				1627
				1628
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1629	.. c:function:: int PyUnicode_CompareWithASCIIString(PyObject uni, char string)
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1630
				1631	Compare a unicode object, uni, with string and return -1, 0, 1 for less
Victor Stinner	80e788a	2010-12-28 23:39:51 +0000	[diff] [blame]	1632	than, equal, and greater than, respectively. It is best to pass only
				1633	ASCII-encoded strings, but the function interprets the input string as
Zachary Ware	780b585	2014-06-06 09:13:18 -0500	[diff] [blame]	1634	ISO-8859-1 if it contains non-ASCII characters.
Benjamin Peterson	c22ed14	2008-07-01 19:12:34 +0000	[diff] [blame]	1635
				1636
Eli Bendersky	0813168	2012-06-03 08:07:47 +0300	[diff] [blame]	1637	.. c:function:: PyObject* PyUnicode_RichCompare(PyObject left, PyObject right, int op)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1638
				1639	Rich compare two unicode strings and return one of the following:
				1640
				1641	* ``NULL`` in case an exception was raised
				1642	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				1643	* :const:`Py_NotImplemented` in case the type combination is unknown
				1644
				1645	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				1646	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				1647	with a :exc:`UnicodeDecodeError`.
				1648
				1649	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				1650	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				1651
				1652
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1653	.. c:function:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1654
				1655	Return a new string object from format and args; this is analogous to
Benjamin Peterson	102488b	2014-07-19 16:34:33 -0700	[diff] [blame]	1656	``format % args``.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1657
				1658
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1659	.. c:function:: int PyUnicode_Contains(PyObject container, PyObject element)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1660
				1661	Check whether element is contained in container and return true or false
				1662	accordingly.
				1663
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1664	element has to coerce to a one element Unicode string. ``-1`` is returned
				1665	if there was an error.
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1666
				1667
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1668	.. c:function:: void PyUnicode_InternInPlace(PyObject **string)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1669
				1670	Intern the argument \string* in place. The argument must be the address of a
				1671	pointer variable pointing to a Python unicode string object. If there is an
				1672	existing interned string that is the same as \string, it sets \string to
				1673	it (decrementing the reference count of the old string object and incrementing
				1674	the reference count of the interned string object), otherwise it leaves
				1675	\string* alone and interns it (incrementing its reference count).
				1676	(Clarification: even though there is a lot of talk about reference counts, think
				1677	of this function as reference-count-neutral; you own the object after the call
				1678	if and only if you owned it before the call.)
				1679
				1680
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1681	.. c:function:: PyObject* PyUnicode_InternFromString(const char *v)
Georg Brandl	54a3faa	2008-01-20 09:30:57 +0000	[diff] [blame]	1682
Georg Brandl	60203b4	2010-10-06 10:11:56 +0000	[diff] [blame]	1683	A combination of :c:func:`PyUnicode_FromString` and
Georg Brandl	db6c7f5	2011-10-07 11:19:11 +0200	[diff] [blame]	1684	:c:func:`PyUnicode_InternInPlace`, returning either a new unicode string
				1685	object that has been interned, or a new ("owned") reference to an earlier
				1686	interned string object with the same value.