Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 0288271c5b9e20651e58a92524c849f56d0e0779 [file] [log] [blame]

Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	14	Unicode Type
				15	""""""""""""
				16
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	17	These are the basic Unicode object types used for the Unicode implementation in
				18	Python:
				19
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	20
				21	.. ctype:: Py_UNICODE
				22
				23	This type represents the storage type which is used by Python internally as
				24	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
				25	for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
				26	possible to build a UCS4 version of Python (most recent Linux distributions come
				27	with UCS4 builds of Python). These builds then use a 32-bit type for
				28	:ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				29	where :ctype:`wchar_t` is available and compatible with the chosen Python
				30	Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
				31	:ctype:`wchar_t` to enhance native platform compatibility. On all other
				32	platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
				33	short` (UCS2) or :ctype:`unsigned long` (UCS4).
				34
				35	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				36	this in mind when writing extensions or interfaces.
				37
				38
				39	.. ctype:: PyUnicodeObject
				40
				41	This subtype of :ctype:`PyObject` represents a Python Unicode object.
				42
				43
				44	.. cvar:: PyTypeObject PyUnicode_Type
				45
				46	This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
				47	is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
				48
				49	The following APIs are really C macros and can be used to do fast checks and to
				50	access internal read-only data of Unicode objects:
				51
				52
				53	.. cfunction:: int PyUnicode_Check(PyObject *o)
				54
				55	Return true if the object o is a Unicode object or an instance of a Unicode
				56	subtype.
				57
				58	.. versionchanged:: 2.2
				59	Allowed subtypes to be accepted.
				60
				61
				62	.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
				63
				64	Return true if the object o is a Unicode object, but not an instance of a
				65	subtype.
				66
				67	.. versionadded:: 2.2
				68
				69
				70	.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				71
				72	Return the size of the object. o has to be a :ctype:`PyUnicodeObject` (not
				73	checked).
				74
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	75	.. versionchanged:: 2.5
				76	This function returned an :ctype:`int` type. This might require changes
				77	in your code for properly supporting 64-bit systems.
				78
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	79
				80	.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				81
				82	Return the size of the object's internal buffer in bytes. o has to be a
				83	:ctype:`PyUnicodeObject` (not checked).
				84
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	85	.. versionchanged:: 2.5
				86	This function returned an :ctype:`int` type. This might require changes
				87	in your code for properly supporting 64-bit systems.
				88
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	89
				90	.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				91
				92	Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. o
				93	has to be a :ctype:`PyUnicodeObject` (not checked).
				94
				95
				96	.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
				97
				98	Return a pointer to the internal buffer of the object. o has to be a
				99	:ctype:`PyUnicodeObject` (not checked).
				100
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	101
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	102	.. cfunction:: int PyUnicode_ClearFreeList()
Christian Heimes	3b718a7	2008-02-14 12:47:33 +0000	[diff] [blame]	103
				104	Clear the free list. Return the total number of freed items.
				105
				106	.. versionadded:: 2.6
				107
Georg Brandl	36b30b5	2009-07-24 16:46:38 +0000	[diff] [blame]	108
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	109	Unicode Character Properties
				110	""""""""""""""""""""""""""""
				111
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	112	Unicode provides many different character properties. The most often needed ones
				113	are available through these macros which are mapped to C functions depending on
				114	the Python configuration.
				115
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	116
				117	.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
				118
				119	Return 1 or 0 depending on whether ch is a whitespace character.
				120
				121
				122	.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
				123
				124	Return 1 or 0 depending on whether ch is a lowercase character.
				125
				126
				127	.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
				128
				129	Return 1 or 0 depending on whether ch is an uppercase character.
				130
				131
				132	.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
				133
				134	Return 1 or 0 depending on whether ch is a titlecase character.
				135
				136
				137	.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
				138
				139	Return 1 or 0 depending on whether ch is a linebreak character.
				140
				141
				142	.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
				143
				144	Return 1 or 0 depending on whether ch is a decimal character.
				145
				146
				147	.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
				148
				149	Return 1 or 0 depending on whether ch is a digit character.
				150
				151
				152	.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
				153
				154	Return 1 or 0 depending on whether ch is a numeric character.
				155
				156
				157	.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
				158
				159	Return 1 or 0 depending on whether ch is an alphabetic character.
				160
				161
				162	.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
				163
				164	Return 1 or 0 depending on whether ch is an alphanumeric character.
				165
				166	These APIs can be used for fast direct character conversions:
				167
				168
				169	.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
				170
				171	Return the character ch converted to lower case.
				172
				173
				174	.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
				175
				176	Return the character ch converted to upper case.
				177
				178
				179	.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
				180
				181	Return the character ch converted to title case.
				182
				183
				184	.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
				185
				186	Return the character ch converted to a decimal positive integer. Return
				187	``-1`` if this is not possible. This macro does not raise exceptions.
				188
				189
				190	.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
				191
				192	Return the character ch converted to a single digit integer. Return ``-1`` if
				193	this is not possible. This macro does not raise exceptions.
				194
				195
				196	.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
				197
				198	Return the character ch converted to a double. Return ``-1.0`` if this is not
				199	possible. This macro does not raise exceptions.
				200
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	201
				202	Plain Py_UNICODE
				203	""""""""""""""""
				204
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	205	To create Unicode objects and access their basic sequence properties, use these
				206	APIs:
				207
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	208
				209	.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				210
				211	Create a Unicode Object from the Py_UNICODE buffer u of the given size. u
				212	may be NULL which causes the contents to be undefined. It is the user's
				213	responsibility to fill in the needed data. The buffer is copied into the new
				214	object. If the buffer is not NULL, the return value might be a shared object.
				215	Therefore, modification of the resulting Unicode object is only allowed when u
				216	is NULL.
				217
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	218	.. versionchanged:: 2.5
				219	This function used an :ctype:`int` type for size. This might require
				220	changes in your code for properly supporting 64-bit systems.
				221
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	222
				223	.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				224
				225	Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
				226	buffer, NULL if unicode is not a Unicode object.
				227
				228
				229	.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				230
				231	Return the length of the Unicode object.
				232
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	233	.. versionchanged:: 2.5
				234	This function returned an :ctype:`int` type. This might require changes
				235	in your code for properly supporting 64-bit systems.
				236
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	237
				238	.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
				239
				240	Coerce an encoded object obj to an Unicode object and return a reference with
				241	incremented refcount.
				242
				243	String and other char buffer compatible objects are decoded according to the
				244	given encoding and using the error handling defined by errors. Both can be
				245	NULL to have the interface use the default values (see the next section for
				246	details).
				247
				248	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				249	set.
				250
				251	The API returns NULL if there was an error. The caller is responsible for
				252	decref'ing the returned objects.
				253
				254
				255	.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
				256
				257	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				258	throughout the interpreter whenever coercion to Unicode is needed.
				259
				260	If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
				261	Python can interface directly to this type using the following functions.
				262	Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
				263	the system's :ctype:`wchar_t`.
				264
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	265
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	266	wchar_t Support
				267	"""""""""""""""
				268
				269	wchar_t support for platforms which support it:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	270
				271	.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
				272
				273	Create a Unicode object from the :ctype:`wchar_t` buffer w of the given size.
				274	Return NULL on failure.
				275
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	276	.. versionchanged:: 2.5
				277	This function used an :ctype:`int` type for size. This might require
				278	changes in your code for properly supporting 64-bit systems.
				279
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	280
				281	.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
				282
				283	Copy the Unicode object contents into the :ctype:`wchar_t` buffer w. At most
				284	size :ctype:`wchar_t` characters are copied (excluding a possibly trailing
				285	0-termination character). Return the number of :ctype:`wchar_t` characters
				286	copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
				287	string may or may not be 0-terminated. It is the responsibility of the caller
				288	to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
				289	required by the application.
				290
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	291	.. versionchanged:: 2.5
				292	This function returned an :ctype:`int` type and used an :ctype:`int`
				293	type for size. This might require changes in your code for properly
				294	supporting 64-bit systems.
				295
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	296
				297	.. _builtincodecs:
				298
				299	Built-in Codecs
				300	^^^^^^^^^^^^^^^
				301
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	302	Python provides a set of built-in codecs which are written in C for speed. All of
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	303	these codecs are directly usable via the following functions.
				304
				305	Many of the following APIs take two arguments encoding and errors. These
				306	parameters encoding and errors have the same semantics as the ones of the
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	307	built-in :func:`unicode` Unicode object constructor.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	308
				309	Setting encoding to NULL causes the default encoding to be used which is
				310	ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
				311	as the encoding for file names. This variable should be treated as read-only: On
				312	some systems, it will be a pointer to a static string, on others, it will change
				313	at run-time (such as when the application invokes setlocale).
				314
				315	Error handling is set by errors which may also be set to NULL meaning to use
				316	the default handling defined for the codec. Default error handling for all
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	317	built-in codecs is "strict" (:exc:`ValueError` is raised).
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	318
				319	The codecs all use a similar interface. Only deviation from the following
				320	generic ones are documented for simplicity.
				321
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	322
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	323	Generic Codecs
				324	""""""""""""""
				325
				326	These are the generic codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	327
				328
				329	.. cfunction:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
				330
				331	Create a Unicode object by decoding size bytes of the encoded string s.
				332	encoding and errors have the same meaning as the parameters of the same name
Georg Brandl	d7d4fd7	2009-07-26 14:37:28 +0000	[diff] [blame]	333	in the :func:`unicode` built-in function. The codec to be used is looked up
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	334	using the Python codec registry. Return NULL if an exception was raised by
				335	the codec.
				336
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	337	.. versionchanged:: 2.5
				338	This function used an :ctype:`int` type for size. This might require
				339	changes in your code for properly supporting 64-bit systems.
				340
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	341
				342	.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
				343
				344	Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
				345	string object. encoding and errors have the same meaning as the parameters
				346	of the same name in the Unicode :meth:`encode` method. The codec to be used is
				347	looked up using the Python codec registry. Return NULL if an exception was
				348	raised by the codec.
				349
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	350	.. versionchanged:: 2.5
				351	This function used an :ctype:`int` type for size. This might require
				352	changes in your code for properly supporting 64-bit systems.
				353
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	354
				355	.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
				356
				357	Encode a Unicode object and return the result as Python string object.
				358	encoding and errors have the same meaning as the parameters of the same name
				359	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				360	the Python codec registry. Return NULL if an exception was raised by the
				361	codec.
				362
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	363
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	364	UTF-8 Codecs
				365	""""""""""""
				366
				367	These are the UTF-8 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	368
				369
				370	.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
				371
				372	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				373	s. Return NULL if an exception was raised by the codec.
				374
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	375	.. versionchanged:: 2.5
				376	This function used an :ctype:`int` type for size. This might require
				377	changes in your code for properly supporting 64-bit systems.
				378
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	379
				380	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				381
				382	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
				383	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				384	treated as an error. Those bytes will not be decoded and the number of bytes
				385	that have been decoded will be stored in consumed.
				386
				387	.. versionadded:: 2.4
				388
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	389	.. versionchanged:: 2.5
				390	This function used an :ctype:`int` type for size. This might require
				391	changes in your code for properly supporting 64-bit systems.
				392
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	393
				394	.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				395
				396	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and return a
				397	Python string object. Return NULL if an exception was raised by the codec.
				398
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	399	.. versionchanged:: 2.5
				400	This function used an :ctype:`int` type for size. This might require
				401	changes in your code for properly supporting 64-bit systems.
				402
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	403
				404	.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
				405
				406	Encode a Unicode object using UTF-8 and return the result as Python string
				407	object. Error handling is "strict". Return NULL if an exception was raised
				408	by the codec.
				409
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	410
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	411	UTF-32 Codecs
				412	"""""""""""""
				413
				414	These are the UTF-32 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	415
				416
				417	.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
				418
				419	Decode length bytes from a UTF-32 encoded buffer string and return the
				420	corresponding Unicode object. errors (if non-NULL) defines the error
				421	handling. It defaults to "strict".
				422
				423	If byteorder is non-NULL, the decoder starts decoding using the given byte
				424	order::
				425
				426	*byteorder == -1: little endian
				427	*byteorder == 0: native order
				428	*byteorder == 1: big endian
				429
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	430	If ``*byteorder`` is zero, and the first four bytes of the input data are a
				431	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				432	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				433	``1``, any byte order mark is copied to the output.
				434
				435	After completion, \byteorder* is set to the current byte order at the end
				436	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	437
				438	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				439
				440	If byteorder is NULL, the codec starts in native order mode.
				441
				442	Return NULL if an exception was raised by the codec.
				443
				444	.. versionadded:: 2.6
				445
				446
				447	.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				448
				449	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
				450	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
				451	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				452	by four) as an error. Those bytes will not be decoded and the number of bytes
				453	that have been decoded will be stored in consumed.
				454
				455	.. versionadded:: 2.6
				456
				457
				458	.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				459
				460	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	461	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	462
				463	byteorder == -1: little endian
				464	byteorder == 0: native byte order (writes a BOM mark)
				465	byteorder == 1: big endian
				466
				467	If byteorder is ``0``, the output string will always start with the Unicode BOM
				468	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				469
				470	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				471	as a single codepoint.
				472
				473	Return NULL if an exception was raised by the codec.
				474
				475	.. versionadded:: 2.6
				476
				477
				478	.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				479
				480	Return a Python string using the UTF-32 encoding in native byte order. The
				481	string always starts with a BOM mark. Error handling is "strict". Return
				482	NULL if an exception was raised by the codec.
				483
				484	.. versionadded:: 2.6
				485
				486
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	487	UTF-16 Codecs
				488	"""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	489
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	490	These are the UTF-16 codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	491
				492
				493	.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
				494
				495	Decode length bytes from a UTF-16 encoded buffer string and return the
				496	corresponding Unicode object. errors (if non-NULL) defines the error
				497	handling. It defaults to "strict".
				498
				499	If byteorder is non-NULL, the decoder starts decoding using the given byte
				500	order::
				501
				502	*byteorder == -1: little endian
				503	*byteorder == 0: native order
				504	*byteorder == 1: big endian
				505
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	506	If ``*byteorder`` is zero, and the first two bytes of the input data are a
				507	byte order mark (BOM), the decoder switches to this byte order and the BOM is
				508	not copied into the resulting Unicode string. If ``*byteorder`` is ``-1`` or
				509	``1``, any byte order mark is copied to the output (where it will result in
				510	either a ``\ufeff`` or a ``\ufffe`` character).
				511
				512	After completion, \byteorder* is set to the current byte order at the end
				513	of input data.
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	514
				515	If byteorder is NULL, the codec starts in native order mode.
				516
				517	Return NULL if an exception was raised by the codec.
				518
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	519	.. versionchanged:: 2.5
				520	This function used an :ctype:`int` type for size. This might require
				521	changes in your code for properly supporting 64-bit systems.
				522
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	523
				524	.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				525
				526	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
				527	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
				528	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				529	split surrogate pair) as an error. Those bytes will not be decoded and the
				530	number of bytes that have been decoded will be stored in consumed.
				531
				532	.. versionadded:: 2.4
				533
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	534	.. versionchanged:: 2.5
				535	This function used an :ctype:`int` type for size and an :ctype:`int *`
				536	type for consumed. This might require changes in your code for
				537	properly supporting 64-bit systems.
				538
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	539
				540	.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				541
				542	Return a Python string object holding the UTF-16 encoded value of the Unicode
Georg Brandl	579a358	2009-09-18 21:35:59 +0000	[diff] [blame]	543	data in s. Output is written according to the following byte order::
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	544
				545	byteorder == -1: little endian
				546	byteorder == 0: native byte order (writes a BOM mark)
				547	byteorder == 1: big endian
				548
				549	If byteorder is ``0``, the output string will always start with the Unicode BOM
				550	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				551
				552	If Py_UNICODE_WIDE is defined, a single :ctype:`Py_UNICODE` value may get
				553	represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
				554	values is interpreted as an UCS-2 character.
				555
				556	Return NULL if an exception was raised by the codec.
				557
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	558	.. versionchanged:: 2.5
				559	This function used an :ctype:`int` type for size. This might require
				560	changes in your code for properly supporting 64-bit systems.
				561
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	562
				563	.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				564
				565	Return a Python string using the UTF-16 encoding in native byte order. The
				566	string always starts with a BOM mark. Error handling is "strict". Return
				567	NULL if an exception was raised by the codec.
				568
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	569
Georg Brandl	7d4bfb3	2010-08-02 21:44:25 +0000	[diff] [blame]	570	UTF-7 Codecs
				571	""""""""""""
				572
				573	These are the UTF-7 codec APIs:
				574
				575
				576	.. cfunction:: PyObject* PyUnicode_DecodeUTF7(const char s, Py_ssize_t size, const char errors)
				577
				578	Create a Unicode object by decoding size bytes of the UTF-7 encoded string
				579	s. Return NULL if an exception was raised by the codec.
				580
				581
				582	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				583
				584	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF7`. If
				585	consumed is not NULL, trailing incomplete UTF-7 base-64 sections will not
				586	be treated as an error. Those bytes will not be decoded and the number of
				587	bytes that have been decoded will be stored in consumed.
				588
				589
				590	.. cfunction:: PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char errors)
				591
				592	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-7 and
				593	return a Python bytes object. Return NULL if an exception was raised by
				594	the codec.
				595
				596	If base64SetO is nonzero, "Set O" (punctuation that has no otherwise
				597	special meaning) will be encoded in base-64. If base64WhiteSpace is
				598	nonzero, whitespace will be encoded in base-64. Both are set to zero for the
				599	Python "utf-7" codec.
				600
				601
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	602	Unicode-Escape Codecs
				603	"""""""""""""""""""""
				604
				605	These are the "Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	606
				607
				608	.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				609
				610	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				611	string s. Return NULL if an exception was raised by the codec.
				612
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	613	.. versionchanged:: 2.5
				614	This function used an :ctype:`int` type for size. This might require
				615	changes in your code for properly supporting 64-bit systems.
				616
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	617
				618	.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
				619
				620	Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
				621	return a Python string object. Return NULL if an exception was raised by the
				622	codec.
				623
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	624	.. versionchanged:: 2.5
				625	This function used an :ctype:`int` type for size. This might require
				626	changes in your code for properly supporting 64-bit systems.
				627
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	628
				629	.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				630
				631	Encode a Unicode object using Unicode-Escape and return the result as Python
				632	string object. Error handling is "strict". Return NULL if an exception was
				633	raised by the codec.
				634
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	635
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	636	Raw-Unicode-Escape Codecs
				637	"""""""""""""""""""""""""
				638
				639	These are the "Raw Unicode Escape" codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	640
				641
				642	.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				643
				644	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				645	encoded string s. Return NULL if an exception was raised by the codec.
				646
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	647	.. versionchanged:: 2.5
				648	This function used an :ctype:`int` type for size. This might require
				649	changes in your code for properly supporting 64-bit systems.
				650
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	651
				652	.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
				653
				654	Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
				655	and return a Python string object. Return NULL if an exception was raised by
				656	the codec.
				657
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	658	.. versionchanged:: 2.5
				659	This function used an :ctype:`int` type for size. This might require
				660	changes in your code for properly supporting 64-bit systems.
				661
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	662
				663	.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				664
				665	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				666	Python string object. Error handling is "strict". Return NULL if an exception
				667	was raised by the codec.
				668
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	669
				670	Latin-1 Codecs
				671	""""""""""""""
				672
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	673	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				674	ordinals and only these are accepted by the codecs during encoding.
				675
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	676
				677	.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
				678
				679	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				680	s. Return NULL if an exception was raised by the codec.
				681
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	682	.. versionchanged:: 2.5
				683	This function used an :ctype:`int` type for size. This might require
				684	changes in your code for properly supporting 64-bit systems.
				685
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	686
				687	.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
				688
				689	Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
				690	a Python string object. Return NULL if an exception was raised by the codec.
				691
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	692	.. versionchanged:: 2.5
				693	This function used an :ctype:`int` type for size. This might require
				694	changes in your code for properly supporting 64-bit systems.
				695
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	696
				697	.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				698
				699	Encode a Unicode object using Latin-1 and return the result as Python string
				700	object. Error handling is "strict". Return NULL if an exception was raised
				701	by the codec.
				702
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	703
				704	ASCII Codecs
				705	""""""""""""
				706
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	707	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				708	codes generate errors.
				709
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	710
				711	.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
				712
				713	Create a Unicode object by decoding size bytes of the ASCII encoded string
				714	s. Return NULL if an exception was raised by the codec.
				715
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	716	.. versionchanged:: 2.5
				717	This function used an :ctype:`int` type for size. This might require
				718	changes in your code for properly supporting 64-bit systems.
				719
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	720
				721	.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
				722
				723	Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
				724	Python string object. Return NULL if an exception was raised by the codec.
				725
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	726	.. versionchanged:: 2.5
				727	This function used an :ctype:`int` type for size. This might require
				728	changes in your code for properly supporting 64-bit systems.
				729
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	730
				731	.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				732
				733	Encode a Unicode object using ASCII and return the result as Python string
				734	object. Error handling is "strict". Return NULL if an exception was raised
				735	by the codec.
				736
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	737
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	738	Character Map Codecs
				739	""""""""""""""""""""
				740
				741	These are the mapping codec APIs:
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	742
				743	This codec is special in that it can be used to implement many different codecs
				744	(and this is in fact what was done to obtain most of the standard codecs
				745	included in the :mod:`encodings` package). The codec uses mapping to encode and
				746	decode characters.
				747
				748	Decoding mappings must map single string characters to single Unicode
				749	characters, integers (which are then interpreted as Unicode ordinals) or None
				750	(meaning "undefined mapping" and causing an error).
				751
				752	Encoding mappings must map single Unicode characters to single string
				753	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				754	(meaning "undefined mapping" and causing an error).
				755
				756	The mapping objects provided must only support the __getitem__ mapping
				757	interface.
				758
				759	If a character lookup fails with a LookupError, the character is copied as-is
				760	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				761	resp. Because of this, mappings only need to contain those mappings which map
				762	characters to different code points.
				763
				764
				765	.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
				766
				767	Create a Unicode object by decoding size bytes of the encoded string s using
				768	the given mapping object. Return NULL if an exception was raised by the
				769	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				770	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				771	Byte values greater that the length of the string and U+FFFE "characters" are
				772	treated as "undefined mapping".
				773
				774	.. versionchanged:: 2.4
				775	Allowed unicode string as mapping argument.
				776
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	777	.. versionchanged:: 2.5
				778	This function used an :ctype:`int` type for size. This might require
				779	changes in your code for properly supporting 64-bit systems.
				780
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	781
				782	.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
				783
				784	Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
				785	mapping object and return a Python string object. Return NULL if an
				786	exception was raised by the codec.
				787
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	788	.. versionchanged:: 2.5
				789	This function used an :ctype:`int` type for size. This might require
				790	changes in your code for properly supporting 64-bit systems.
				791
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	792
				793	.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
				794
				795	Encode a Unicode object using the given mapping object and return the result
				796	as Python string object. Error handling is "strict". Return NULL if an
				797	exception was raised by the codec.
				798
				799	The following codec API is special in that maps Unicode to Unicode.
				800
				801
				802	.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
				803
				804	Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
				805	character mapping table to it and return the resulting Unicode object. Return
				806	NULL when an exception was raised by the codec.
				807
				808	The mapping table must map Unicode ordinal integers to Unicode ordinal
				809	integers or None (causing deletion of the character).
				810
				811	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				812	and sequences work well. Unmapped character ordinals (ones which cause a
				813	:exc:`LookupError`) are left untouched and are copied as-is.
				814
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	815	.. versionchanged:: 2.5
				816	This function used an :ctype:`int` type for size. This might require
				817	changes in your code for properly supporting 64-bit systems.
				818
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	819	These are the MBCS codec APIs. They are currently only available on Windows and
				820	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				821	DBCS) is a class of encodings, not just one. The target encoding is defined by
				822	the user settings on the machine running the codec.
				823
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	824
				825	MBCS codecs for Windows
				826	"""""""""""""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	827
				828
				829	.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
				830
				831	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				832	Return NULL if an exception was raised by the codec.
				833
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	834	.. versionchanged:: 2.5
				835	This function used an :ctype:`int` type for size. This might require
				836	changes in your code for properly supporting 64-bit systems.
				837
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	838
				839	.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
				840
				841	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
				842	consumed is not NULL, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
				843	trailing lead byte and the number of bytes that have been decoded will be stored
				844	in consumed.
				845
				846	.. versionadded:: 2.5
				847
				848
				849	.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
				850
				851	Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
				852	Python string object. Return NULL if an exception was raised by the codec.
				853
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	854	.. versionchanged:: 2.5
				855	This function used an :ctype:`int` type for size. This might require
				856	changes in your code for properly supporting 64-bit systems.
				857
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	858
				859	.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				860
				861	Encode a Unicode object using MBCS and return the result as Python string
				862	object. Error handling is "strict". Return NULL if an exception was raised
				863	by the codec.
				864
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	865
Victor Stinner	5f8aae0	2010-05-14 15:53:20 +0000	[diff] [blame]	866	Methods & Slots
				867	"""""""""""""""
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	868
				869	.. _unicodemethodsandslots:
				870
				871	Methods and Slot Functions
				872	^^^^^^^^^^^^^^^^^^^^^^^^^^
				873
				874	The following APIs are capable of handling Unicode objects and strings on input
				875	(we refer to them as strings in the descriptions) and return Unicode objects or
				876	integers as appropriate.
				877
				878	They all return NULL or ``-1`` if an exception occurs.
				879
				880
				881	.. cfunction:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
				882
				883	Concat two strings giving a new Unicode string.
				884
				885
				886	.. cfunction:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
				887
				888	Split a string giving a list of Unicode strings. If sep is NULL, splitting
				889	will be done at all whitespace substrings. Otherwise, splits occur at the given
				890	separator. At most maxsplit splits will be done. If negative, no limit is
				891	set. Separators are not included in the resulting list.
				892
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	893	.. versionchanged:: 2.5
				894	This function used an :ctype:`int` type for maxsplit. This might require
				895	changes in your code for properly supporting 64-bit systems.
				896
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	897
				898	.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
				899
				900	Split a Unicode string at line breaks, returning a list of Unicode strings.
				901	CRLF is considered to be one line break. If keepend is 0, the Line break
				902	characters are not included in the resulting strings.
				903
				904
				905	.. cfunction:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
				906
				907	Translate a string by applying a character mapping table to it and return the
				908	resulting Unicode object.
				909
				910	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				911	or None (causing deletion of the character).
				912
				913	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				914	and sequences work well. Unmapped character ordinals (ones which cause a
				915	:exc:`LookupError`) are left untouched and are copied as-is.
				916
				917	errors has the usual meaning for codecs. It may be NULL which indicates to
				918	use the default error handling.
				919
				920
				921	.. cfunction:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
				922
				923	Join a sequence of strings using the given separator and return the resulting
				924	Unicode string.
				925
				926
				927	.. cfunction:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				928
				929	Return 1 if substr matches str[start:end] at the given tail end
				930	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				931	0 otherwise. Return ``-1`` if an error occurred.
				932
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	933	.. versionchanged:: 2.5
				934	This function used an :ctype:`int` type for start and end. This
				935	might require changes in your code for properly supporting 64-bit
				936	systems.
				937
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	938
				939	.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				940
				941	Return the first position of substr in str[start:end] using the given
				942	direction (direction == 1 means to do a forward search, direction == -1 a
				943	backward search). The return value is the index of the first match; a value of
				944	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				945	occurred and an exception has been set.
				946
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	947	.. versionchanged:: 2.5
				948	This function used an :ctype:`int` type for start and end. This
				949	might require changes in your code for properly supporting 64-bit
				950	systems.
				951
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	952
				953	.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
				954
				955	Return the number of non-overlapping occurrences of substr in
				956	``str[start:end]``. Return ``-1`` if an error occurred.
				957
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	958	.. versionchanged:: 2.5
				959	This function returned an :ctype:`int` type and used an :ctype:`int`
				960	type for start and end. This might require changes in your code for
				961	properly supporting 64-bit systems.
				962
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	963
				964	.. cfunction:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
				965
				966	Replace at most maxcount occurrences of substr in str with replstr and
				967	return the resulting Unicode object. maxcount == -1 means replace all
				968	occurrences.
				969
Jeroen Ruigrok van der Werven	dfcffd4	2009-04-25 21:16:05 +0000	[diff] [blame]	970	.. versionchanged:: 2.5
				971	This function used an :ctype:`int` type for maxcount. This might
				972	require changes in your code for properly supporting 64-bit systems.
				973
Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	974
				975	.. cfunction:: int PyUnicode_Compare(PyObject left, PyObject right)
				976
				977	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				978	respectively.
				979
				980
				981	.. cfunction:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
				982
				983	Rich compare two unicode strings and return one of the following:
				984
				985	* ``NULL`` in case an exception was raised
				986	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				987	* :const:`Py_NotImplemented` in case the type combination is unknown
				988
				989	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				990	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				991	with a :exc:`UnicodeDecodeError`.
				992
				993	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				994	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				995
				996
				997	.. cfunction:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
				998
				999	Return a new string object from format and args; this is analogous to
				1000	``format % args``. The args argument must be a tuple.
				1001
				1002
				1003	.. cfunction:: int PyUnicode_Contains(PyObject container, PyObject element)
				1004
				1005	Check whether element is contained in container and return true or false
				1006	accordingly.
				1007
				1008	element has to coerce to a one element Unicode string. ``-1`` is returned if
				1009	there was an error.