Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 7fce170b5e220ba99684d2c45833e914e9d1d967 [file] [log] [blame]

Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	14	Unicode Type
				15	""""""""""""
				16
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	17	These are the basic Unicode object types used for the Unicode implementation in
				18	Python:
				19
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	20
				21	.. ctype:: Py_UNICODE
				22
				23	This type represents the storage type which is used by Python internally as
				24	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
				25	for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
				26	possible to build a UCS4 version of Python (most recent Linux distributions come
				27	with UCS4 builds of Python). These builds then use a 32-bit type for
				28	:ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				29	where :ctype:`wchar_t` is available and compatible with the chosen Python
				30	Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
				31	:ctype:`wchar_t` to enhance native platform compatibility. On all other
				32	platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
				33	short` (UCS2) or :ctype:`unsigned long` (UCS4).
				34
				35	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				36	this in mind when writing extensions or interfaces.
				37
				38
				39	.. ctype:: PyUnicodeObject
				40
				41	This subtype of :ctype:`PyObject` represents a Python Unicode object.
				42
				43
				44	.. cvar:: PyTypeObject PyUnicode_Type
				45
				46	This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
				47	is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
				48
				49	The following APIs are really C macros and can be used to do fast checks and to
				50	access internal read-only data of Unicode objects:
				51
				52
				53	.. cfunction:: int PyUnicode_Check(PyObject *o)
				54
				55	Return true if the object o is a Unicode object or an instance of a Unicode
				56	subtype.
				57
				58	.. versionchanged:: 2.2
				59	Allowed subtypes to be accepted.
				60
				61
				62	.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
				63
				64	Return true if the object o is a Unicode object, but not an instance of a
				65	subtype.
				66
				67	.. versionadded:: 2.2
				68
				69
				70	.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				71
				72	Return the size of the object. o has to be a :ctype:`PyUnicodeObject` (not
				73	checked).
				74
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	75	.. versionchanged:: 2.5
				76	This function returned an :ctype:`int` type. This might require changes
				77	in your code for properly supporting 64-bit systems.
				78
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	79
				80	.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				81
				82	Return the size of the object's internal buffer in bytes. o has to be a
				83	:ctype:`PyUnicodeObject` (not checked).
				84
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	85	.. versionchanged:: 2.5
				86	This function returned an :ctype:`int` type. This might require changes
				87	in your code for properly supporting 64-bit systems.
				88
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	89
				90	.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				91
				92	Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. o
				93	has to be a :ctype:`PyUnicodeObject` (not checked).
				94
				95
				96	.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
				97
				98	Return a pointer to the internal buffer of the object. o has to be a
				99	:ctype:`PyUnicodeObject` (not checked).
				100
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	101
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	102	.. cfunction:: int PyUnicode_ClearFreeList()
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	103
				104	Clear the free list. Return the total number of freed items.
				105
				106	.. versionadded:: 2.6
				107
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	108
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	109	Unicode Character Properties
				110	""""""""""""""""""""""""""""
				111
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	112	Unicode provides many different character properties. The most often needed ones
				113	are available through these macros which are mapped to C functions depending on
				114	the Python configuration.
				115
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	116
				117	.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
				118
				119	Return 1 or 0 depending on whether ch is a whitespace character.
				120
				121
				122	.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
				123
				124	Return 1 or 0 depending on whether ch is a lowercase character.
				125
				126
				127	.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
				128
				129	Return 1 or 0 depending on whether ch is an uppercase character.
				130
				131
				132	.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
				133
				134	Return 1 or 0 depending on whether ch is a titlecase character.
				135
				136
				137	.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
				138
				139	Return 1 or 0 depending on whether ch is a linebreak character.
				140
				141
				142	.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
				143
				144	Return 1 or 0 depending on whether ch is a decimal character.
				145
				146
				147	.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
				148
				149	Return 1 or 0 depending on whether ch is a digit character.
				150
				151
				152	.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
				153
				154	Return 1 or 0 depending on whether ch is a numeric character.
				155
				156
				157	.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
				158
				159	Return 1 or 0 depending on whether ch is an alphabetic character.
				160
				161
				162	.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
				163
				164	Return 1 or 0 depending on whether ch is an alphanumeric character.
				165
				166	These APIs can be used for fast direct character conversions:
				167
				168
				169	.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
				170
				171	Return the character ch converted to lower case.
				172
				173
				174	.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
				175
				176	Return the character ch converted to upper case.
				177
				178
				179	.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
				180
				181	Return the character ch converted to title case.
				182
				183
				184	.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
				185
				186	Return the character ch converted to a decimal positive integer. Return
				187	``-1`` if this is not possible. This macro does not raise exceptions.
				188
				189
				190	.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
				191
				192	Return the character ch converted to a single digit integer. Return ``-1`` if
				193	this is not possible. This macro does not raise exceptions.
				194
				195
				196	.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
				197
				198	Return the character ch converted to a double. Return ``-1.0`` if this is not
				199	possible. This macro does not raise exceptions.
				200
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	201
				202	Plain Py_UNICODE
				203	""""""""""""""""
				204
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	205	To create Unicode objects and access their basic sequence properties, use these
				206	APIs:
				207
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	208
				209	.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				210
				211	Create a Unicode Object from the Py_UNICODE buffer u of the given size. u
				212	may be NULL which causes the contents to be undefined. It is the user's
				213	responsibility to fill in the needed data. The buffer is copied into the new
				214	object. If the buffer is not NULL, the return value might be a shared object.
				215	Therefore, modification of the resulting Unicode object is only allowed when u
				216	is NULL.
				217
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	218	.. versionchanged:: 2.5
				219	This function used an :ctype:`int` type for size. This might require
				220	changes in your code for properly supporting 64-bit systems.
				221
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	222
				223	.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				224
				225	Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
				226	buffer, NULL if unicode is not a Unicode object.
				227
				228
				229	.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				230
				231	Return the length of the Unicode object.
				232
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	233	.. versionchanged:: 2.5
				234	This function returned an :ctype:`int` type. This might require changes
				235	in your code for properly supporting 64-bit systems.
				236
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	237
				238	.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
				239
				240	Coerce an encoded object obj to an Unicode object and return a reference with
				241	incremented refcount.
				242
				243	String and other char buffer compatible objects are decoded according to the
				244	given encoding and using the error handling defined by errors. Both can be
				245	NULL to have the interface use the default values (see the next section for
				246	details).
				247
				248	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				249	set.
				250
				251	The API returns NULL if there was an error. The caller is responsible for
				252	decref'ing the returned objects.
				253
				254
				255	.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
				256
				257	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				258	throughout the interpreter whenever coercion to Unicode is needed.
				259
				260	If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
				261	Python can interface directly to this type using the following functions.
				262	Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
				263	the system's :ctype:`wchar_t`.
				264
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	265
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	266	wchar_t Support
				267	"""""""""""""""
				268
				269	wchar_t support for platforms which support it:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	270
				271	.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
				272
				273	Create a Unicode object from the :ctype:`wchar_t` buffer w of the given size.
				274	Return NULL on failure.
				275
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	276	.. versionchanged:: 2.5
				277	This function used an :ctype:`int` type for size. This might require
				278	changes in your code for properly supporting 64-bit systems.
				279
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	280
				281	.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
				282
				283	Copy the Unicode object contents into the :ctype:`wchar_t` buffer w. At most
				284	size :ctype:`wchar_t` characters are copied (excluding a possibly trailing
				285	0-termination character). Return the number of :ctype:`wchar_t` characters
				286	copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
				287	string may or may not be 0-terminated. It is the responsibility of the caller
				288	to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
				289	required by the application.
				290
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	291	.. versionchanged:: 2.5
				292	This function returned an :ctype:`int` type and used an :ctype:`int`
				293	type for size. This might require changes in your code for properly
				294	supporting 64-bit systems.
				295
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	296
				297	.. _builtincodecs:
				298
				299	Built-in Codecs
				300	^^^^^^^^^^^^^^^
				301
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	302	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	303	these codecs are directly usable via the following functions.
				304
				305	Many of the following APIs take two arguments encoding and errors. These
				306	parameters encoding and errors have the same semantics as the ones of the
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	307	built-in :func:`unicode` Unicode object constructor.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	308
				309	Setting encoding to NULL causes the default encoding to be used which is
				310	ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
				311	as the encoding for file names. This variable should be treated as read-only: On
				312	some systems, it will be a pointer to a static string, on others, it will change
				313	at run-time (such as when the application invokes setlocale).
				314
				315	Error handling is set by errors which may also be set to NULL meaning to use
				316	the default handling defined for the codec. Default error handling for all
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	317	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	318
				319	The codecs all use a similar interface. Only deviation from the following
				320	generic ones are documented for simplicity.
				321
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	322
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	323	Generic Codecs
				324	""""""""""""""
				325
				326	These are the generic codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	327
				328
				329	.. cfunction:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
				330
				331	Create a Unicode object by decoding size bytes of the encoded string s.
				332	encoding and errors have the same meaning as the parameters of the same name
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	333	in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	334	using the Python codec registry. Return NULL if an exception was raised by
				335	the codec.
				336
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	337	.. versionchanged:: 2.5
				338	This function used an :ctype:`int` type for size. This might require
				339	changes in your code for properly supporting 64-bit systems.
				340
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	341
				342	.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
				343
				344	Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
				345	string object. encoding and errors have the same meaning as the parameters
				346	of the same name in the Unicode :meth:`encode` method. The codec to be used is
				347	looked up using the Python codec registry. Return NULL if an exception was
				348	raised by the codec.
				349
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	350	.. versionchanged:: 2.5
				351	This function used an :ctype:`int` type for size. This might require
				352	changes in your code for properly supporting 64-bit systems.
				353
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	354
				355	.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
				356
				357	Encode a Unicode object and return the result as Python string object.
				358	encoding and errors have the same meaning as the parameters of the same name
				359	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				360	the Python codec registry. Return NULL if an exception was raised by the
				361	codec.
				362
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	363
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	364	UTF-8 Codecs
				365	""""""""""""
				366
				367	These are the UTF-8 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	368
				369
				370	.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
				371
				372	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				373	s. Return NULL if an exception was raised by the codec.
				374
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	375	.. versionchanged:: 2.5
				376	This function used an :ctype:`int` type for size. This might require
				377	changes in your code for properly supporting 64-bit systems.
				378
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	379
				380	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				381
				382	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
				383	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				384	treated as an error. Those bytes will not be decoded and the number of bytes
				385	that have been decoded will be stored in consumed.
				386
				387	.. versionadded:: 2.4
				388
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	389	.. versionchanged:: 2.5
				390	This function used an :ctype:`int` type for size. This might require
				391	changes in your code for properly supporting 64-bit systems.
				392
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	393
				394	.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				395
				396	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and return a
				397	Python string object. Return NULL if an exception was raised by the codec.
				398
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	399	.. versionchanged:: 2.5
				400	This function used an :ctype:`int` type for size. This might require
				401	changes in your code for properly supporting 64-bit systems.
				402
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	403
				404	.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
				405
				406	Encode a Unicode object using UTF-8 and return the result as Python string
				407	object. Error handling is "strict". Return NULL if an exception was raised
				408	by the codec.
				409
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	410
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	411	UTF-32 Codecs
				412	"""""""""""""
				413
				414	These are the UTF-32 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	415
				416
				417	.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
				418
				419	Decode length bytes from a UTF-32 encoded buffer string and return the
				420	corresponding Unicode object. errors (if non-NULL) defines the error
				421	handling. It defaults to "strict".
				422
				423	If byteorder is non-NULL, the decoder starts decoding using the given byte
				424	order::
				425
				426	*byteorder == -1: little endian
				427	*byteorder == 0: native order
				428	*byteorder == 1: big endian
				429
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	430	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				431	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				432	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				433	``1``, any byte order mark is copied to the output.
				434
				435	After completion, \byteorder* is set to the current byte order at the end
				436	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	437
				438	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				439
				440	If byteorder is NULL, the codec starts in native order mode.
				441
				442	Return NULL if an exception was raised by the codec.
				443
				444	.. versionadded:: 2.6
				445
				446
				447	.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				448
				449	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
				450	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
				451	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				452	by four) as an error. Those bytes will not be decoded and the number of bytes
				453	that have been decoded will be stored in consumed.
				454
				455	.. versionadded:: 2.6
				456
				457
				458	.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				459
				460	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	461	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	462
				463	byteorder == -1: little endian
				464	byteorder == 0: native byte order (writes a BOM mark)
				465	byteorder == 1: big endian
				466
				467	If byteorder is ``0``, the output string will always start with the Unicode BOM
				468	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				469
				470	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				471	as a single codepoint.
				472
				473	Return NULL if an exception was raised by the codec.
				474
				475	.. versionadded:: 2.6
				476
				477
				478	.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				479
				480	Return a Python string using the UTF-32 encoding in native byte order. The
				481	string always starts with a BOM mark. Error handling is "strict". Return
				482	NULL if an exception was raised by the codec.
				483
				484	.. versionadded:: 2.6
				485
				486
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	487	UTF-16 Codecs
				488	"""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	489
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	490	These are the UTF-16 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	491
				492
				493	.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
				494
				495	Decode length bytes from a UTF-16 encoded buffer string and return the
				496	corresponding Unicode object. errors (if non-NULL) defines the error
				497	handling. It defaults to "strict".
				498
				499	If byteorder is non-NULL, the decoder starts decoding using the given byte
				500	order::
				501
				502	*byteorder == -1: little endian
				503	*byteorder == 0: native order
				504	*byteorder == 1: big endian
				505
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	506	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				507	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				508	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				509	``1``, any byte order mark is copied to the output (where it will result in
				510	either a ``\ufeff`` or a ``\ufffe`` character).
				511
				512	After completion, \byteorder* is set to the current byte order at the end
				513	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	514
				515	If byteorder is NULL, the codec starts in native order mode.
				516
				517	Return NULL if an exception was raised by the codec.
				518
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	519	.. versionchanged:: 2.5
				520	This function used an :ctype:`int` type for size. This might require
				521	changes in your code for properly supporting 64-bit systems.
				522
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	523
				524	.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				525
				526	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
				527	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
				528	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				529	split surrogate pair) as an error. Those bytes will not be decoded and the
				530	number of bytes that have been decoded will be stored in consumed.
				531
				532	.. versionadded:: 2.4
				533
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	534	.. versionchanged:: 2.5
				535	This function used an :ctype:`int` type for size and an :ctype:`int *`
				536	type for consumed. This might require changes in your code for
				537	properly supporting 64-bit systems.
				538
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	539
				540	.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				541
				542	Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	543	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	544
				545	byteorder == -1: little endian
				546	byteorder == 0: native byte order (writes a BOM mark)
				547	byteorder == 1: big endian
				548
				549	If byteorder is ``0``, the output string will always start with the Unicode BOM
				550	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				551
				552	If Py_UNICODE_WIDE is defined, a single :ctype:`Py_UNICODE` value may get
				553	represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
				554	values is interpreted as an UCS-2 character.
				555
				556	Return NULL if an exception was raised by the codec.
				557
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	558	.. versionchanged:: 2.5
				559	This function used an :ctype:`int` type for size. This might require
				560	changes in your code for properly supporting 64-bit systems.
				561
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	562
				563	.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				564
				565	Return a Python string using the UTF-16 encoding in native byte order. The
				566	string always starts with a BOM mark. Error handling is "strict". Return
				567	NULL if an exception was raised by the codec.
				568
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	569
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	570	Unicode-Escape Codecs
				571	"""""""""""""""""""""
				572
				573	These are the "Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	574
				575
				576	.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				577
				578	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				579	string s. Return NULL if an exception was raised by the codec.
				580
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	581	.. versionchanged:: 2.5
				582	This function used an :ctype:`int` type for size. This might require
				583	changes in your code for properly supporting 64-bit systems.
				584
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	585
				586	.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
				587
				588	Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
				589	return a Python string object. Return NULL if an exception was raised by the
				590	codec.
				591
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	592	.. versionchanged:: 2.5
				593	This function used an :ctype:`int` type for size. This might require
				594	changes in your code for properly supporting 64-bit systems.
				595
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	596
				597	.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				598
				599	Encode a Unicode object using Unicode-Escape and return the result as Python
				600	string object. Error handling is "strict". Return NULL if an exception was
				601	raised by the codec.
				602
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	603
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	604	Raw-Unicode-Escape Codecs
				605	"""""""""""""""""""""""""
				606
				607	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	608
				609
				610	.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				611
				612	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				613	encoded string s. Return NULL if an exception was raised by the codec.
				614
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	615	.. versionchanged:: 2.5
				616	This function used an :ctype:`int` type for size. This might require
				617	changes in your code for properly supporting 64-bit systems.
				618
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	619
				620	.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
				621
				622	Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
				623	and return a Python string object. Return NULL if an exception was raised by
				624	the codec.
				625
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	626	.. versionchanged:: 2.5
				627	This function used an :ctype:`int` type for size. This might require
				628	changes in your code for properly supporting 64-bit systems.
				629
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	630
				631	.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				632
				633	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				634	Python string object. Error handling is "strict". Return NULL if an exception
				635	was raised by the codec.
				636
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	637
				638	Latin-1 Codecs
				639	""""""""""""""
				640
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	641	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				642	ordinals and only these are accepted by the codecs during encoding.
				643
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	644
				645	.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
				646
				647	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				648	s. Return NULL if an exception was raised by the codec.
				649
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	650	.. versionchanged:: 2.5
				651	This function used an :ctype:`int` type for size. This might require
				652	changes in your code for properly supporting 64-bit systems.
				653
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	654
				655	.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
				656
				657	Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
				658	a Python string object. Return NULL if an exception was raised by the codec.
				659
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	660	.. versionchanged:: 2.5
				661	This function used an :ctype:`int` type for size. This might require
				662	changes in your code for properly supporting 64-bit systems.
				663
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	664
				665	.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				666
				667	Encode a Unicode object using Latin-1 and return the result as Python string
				668	object. Error handling is "strict". Return NULL if an exception was raised
				669	by the codec.
				670
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	671
				672	ASCII Codecs
				673	""""""""""""
				674
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	675	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				676	codes generate errors.
				677
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	678
				679	.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
				680
				681	Create a Unicode object by decoding size bytes of the ASCII encoded string
				682	s. Return NULL if an exception was raised by the codec.
				683
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	684	.. versionchanged:: 2.5
				685	This function used an :ctype:`int` type for size. This might require
				686	changes in your code for properly supporting 64-bit systems.
				687
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	688
				689	.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
				690
				691	Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
				692	Python string object. Return NULL if an exception was raised by the codec.
				693
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	694	.. versionchanged:: 2.5
				695	This function used an :ctype:`int` type for size. This might require
				696	changes in your code for properly supporting 64-bit systems.
				697
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	698
				699	.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				700
				701	Encode a Unicode object using ASCII and return the result as Python string
				702	object. Error handling is "strict". Return NULL if an exception was raised
				703	by the codec.
				704
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	705
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	706	Character Map Codecs
				707	""""""""""""""""""""
				708
				709	These are the mapping codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	710
				711	This codec is special in that it can be used to implement many different codecs
				712	(and this is in fact what was done to obtain most of the standard codecs
				713	included in the :mod:`encodings` package). The codec uses mapping to encode and
				714	decode characters.
				715
				716	Decoding mappings must map single string characters to single Unicode
				717	characters, integers (which are then interpreted as Unicode ordinals) or None
				718	(meaning "undefined mapping" and causing an error).
				719
				720	Encoding mappings must map single Unicode characters to single string
				721	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				722	(meaning "undefined mapping" and causing an error).
				723
				724	The mapping objects provided must only support the __getitem__ mapping
				725	interface.
				726
				727	If a character lookup fails with a LookupError, the character is copied as-is
				728	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				729	resp. Because of this, mappings only need to contain those mappings which map
				730	characters to different code points.
				731
				732
				733	.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
				734
				735	Create a Unicode object by decoding size bytes of the encoded string s using
				736	the given mapping object. Return NULL if an exception was raised by the
				737	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				738	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				739	Byte values greater that the length of the string and U+FFFE "characters" are
				740	treated as "undefined mapping".
				741
				742	.. versionchanged:: 2.4
				743	Allowed unicode string as mapping argument.
				744
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	745	.. versionchanged:: 2.5
				746	This function used an :ctype:`int` type for size. This might require
				747	changes in your code for properly supporting 64-bit systems.
				748
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	749
				750	.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
				751
				752	Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
				753	mapping object and return a Python string object. Return NULL if an
				754	exception was raised by the codec.
				755
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	756	.. versionchanged:: 2.5
				757	This function used an :ctype:`int` type for size. This might require
				758	changes in your code for properly supporting 64-bit systems.
				759
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	760
				761	.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
				762
				763	Encode a Unicode object using the given mapping object and return the result
				764	as Python string object. Error handling is "strict". Return NULL if an
				765	exception was raised by the codec.
				766
				767	The following codec API is special in that maps Unicode to Unicode.
				768
				769
				770	.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
				771
				772	Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
				773	character mapping table to it and return the resulting Unicode object. Return
				774	NULL when an exception was raised by the codec.
				775
				776	The mapping table must map Unicode ordinal integers to Unicode ordinal
				777	integers or None (causing deletion of the character).
				778
				779	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				780	and sequences work well. Unmapped character ordinals (ones which cause a
				781	:exc:`LookupError`) are left untouched and are copied as-is.
				782
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	783	.. versionchanged:: 2.5
				784	This function used an :ctype:`int` type for size. This might require
				785	changes in your code for properly supporting 64-bit systems.
				786
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	787	These are the MBCS codec APIs. They are currently only available on Windows and
				788	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				789	DBCS) is a class of encodings, not just one. The target encoding is defined by
				790	the user settings on the machine running the codec.
				791
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	792
				793	MBCS codecs for Windows
				794	"""""""""""""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	795
				796
				797	.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
				798
				799	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				800	Return NULL if an exception was raised by the codec.
				801
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	802	.. versionchanged:: 2.5
				803	This function used an :ctype:`int` type for size. This might require
				804	changes in your code for properly supporting 64-bit systems.
				805
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	806
				807	.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
				808
				809	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
				810	consumed is not NULL, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
				811	trailing lead byte and the number of bytes that have been decoded will be stored
				812	in consumed.
				813
				814	.. versionadded:: 2.5
				815
				816
				817	.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
				818
				819	Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
				820	Python string object. Return NULL if an exception was raised by the codec.
				821
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	822	.. versionchanged:: 2.5
				823	This function used an :ctype:`int` type for size. This might require
				824	changes in your code for properly supporting 64-bit systems.
				825
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	826
				827	.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				828
				829	Encode a Unicode object using MBCS and return the result as Python string
				830	object. Error handling is "strict". Return NULL if an exception was raised
				831	by the codec.
				832
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	833
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	834	Methods & Slots
				835	"""""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	836
				837	.. _unicodemethodsandslots:
				838
				839	Methods and Slot Functions
				840	^^^^^^^^^^^^^^^^^^^^^^^^^^
				841
				842	The following APIs are capable of handling Unicode objects and strings on input
				843	(we refer to them as strings in the descriptions) and return Unicode objects or
				844	integers as appropriate.
				845
				846	They all return NULL or ``-1`` if an exception occurs.
				847
				848
				849	.. cfunction:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
				850
				851	Concat two strings giving a new Unicode string.
				852
				853
				854	.. cfunction:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
				855
				856	Split a string giving a list of Unicode strings. If sep is NULL, splitting
				857	will be done at all whitespace substrings. Otherwise, splits occur at the given
				858	separator. At most maxsplit splits will be done. If negative, no limit is
				859	set. Separators are not included in the resulting list.
				860
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	861	.. versionchanged:: 2.5
				862	This function used an :ctype:`int` type for maxsplit. This might require
				863	changes in your code for properly supporting 64-bit systems.
				864
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	865
				866	.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
				867
				868	Split a Unicode string at line breaks, returning a list of Unicode strings.
				869	CRLF is considered to be one line break. If keepend is 0, the Line break
				870	characters are not included in the resulting strings.
				871
				872
				873	.. cfunction:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
				874
				875	Translate a string by applying a character mapping table to it and return the
				876	resulting Unicode object.
				877
				878	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				879	or None (causing deletion of the character).
				880
				881	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				882	and sequences work well. Unmapped character ordinals (ones which cause a
				883	:exc:`LookupError`) are left untouched and are copied as-is.
				884
				885	errors has the usual meaning for codecs. It may be NULL which indicates to
				886	use the default error handling.
				887
				888
				889	.. cfunction:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
				890
				891	Join a sequence of strings using the given separator and return the resulting
				892	Unicode string.
				893
				894
				895	.. cfunction:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				896
				897	Return 1 if substr matches str[start:end] at the given tail end
				898	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				899	0 otherwise. Return ``-1`` if an error occurred.
				900
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	901	.. versionchanged:: 2.5
				902	This function used an :ctype:`int` type for start and end. This
				903	might require changes in your code for properly supporting 64-bit
				904	systems.
				905
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	906
				907	.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				908
				909	Return the first position of substr in str[start:end] using the given
				910	direction (direction == 1 means to do a forward search, direction == -1 a
				911	backward search). The return value is the index of the first match; a value of
				912	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				913	occurred and an exception has been set.
				914
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	915	.. versionchanged:: 2.5
				916	This function used an :ctype:`int` type for start and end. This
				917	might require changes in your code for properly supporting 64-bit
				918	systems.
				919
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	920
				921	.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
				922
				923	Return the number of non-overlapping occurrences of substr in
				924	``str[start:end]``. Return ``-1`` if an error occurred.
				925
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	926	.. versionchanged:: 2.5
				927	This function returned an :ctype:`int` type and used an :ctype:`int`
				928	type for start and end. This might require changes in your code for
				929	properly supporting 64-bit systems.
				930
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	931
				932	.. cfunction:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
				933
				934	Replace at most maxcount occurrences of substr in str with replstr and
				935	return the resulting Unicode object. maxcount == -1 means replace all
				936	occurrences.
				937
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	938	.. versionchanged:: 2.5
				939	This function used an :ctype:`int` type for maxcount. This might
				940	require changes in your code for properly supporting 64-bit systems.
				941
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	942
				943	.. cfunction:: int PyUnicode_Compare(PyObject left, PyObject right)
				944
				945	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				946	respectively.
				947
				948
				949	.. cfunction:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
				950
				951	Rich compare two unicode strings and return one of the following:
				952
				953	* ``NULL`` in case an exception was raised
				954	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				955	* :const:`Py_NotImplemented` in case the type combination is unknown
				956
				957	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				958	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				959	with a :exc:`UnicodeDecodeError`.
				960
				961	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				962	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				963
				964
				965	.. cfunction:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
				966
				967	Return a new string object from format and args; this is analogous to
				968	``format % args``. The args argument must be a tuple.
				969
				970
				971	.. cfunction:: int PyUnicode_Contains(PyObject container, PyObject element)
				972
				973	Check whether element is contained in container and return true or false
				974	accordingly.
				975
				976	element has to coerce to a one element Unicode string. ``-1`` is returned if
				977	there was an error.