Blame - Doc/c-api/unicode.rst - platform/external/python/cpython2

blob: 7c570ab78826f7f5f493eb0e3374fbf53a1796e7 [file] [log] [blame]

Georg Brandl	f684272	2008-01-19 22:08:21 +0000	[diff] [blame]	1	.. highlightlang:: c
				2
				3	.. _unicodeobjects:
				4
				5	Unicode Objects and Codecs
				6	--------------------------
				7
				8	.. sectionauthor:: Marc-Andre Lemburg <mal@lemburg.com>
				9
				10	Unicode Objects
				11	^^^^^^^^^^^^^^^
				12
				13
				14	These are the basic Unicode object types used for the Unicode implementation in
				15	Python:
				16
				17	.. % --- Unicode Type -------------------------------------------------------
				18
				19
				20	.. ctype:: Py_UNICODE
				21
				22	This type represents the storage type which is used by Python internally as
				23	basis for holding Unicode ordinals. Python's default builds use a 16-bit type
				24	for :ctype:`Py_UNICODE` and store Unicode values internally as UCS2. It is also
				25	possible to build a UCS4 version of Python (most recent Linux distributions come
				26	with UCS4 builds of Python). These builds then use a 32-bit type for
				27	:ctype:`Py_UNICODE` and store Unicode data internally as UCS4. On platforms
				28	where :ctype:`wchar_t` is available and compatible with the chosen Python
				29	Unicode build variant, :ctype:`Py_UNICODE` is a typedef alias for
				30	:ctype:`wchar_t` to enhance native platform compatibility. On all other
				31	platforms, :ctype:`Py_UNICODE` is a typedef alias for either :ctype:`unsigned
				32	short` (UCS2) or :ctype:`unsigned long` (UCS4).
				33
				34	Note that UCS2 and UCS4 Python builds are not binary compatible. Please keep
				35	this in mind when writing extensions or interfaces.
				36
				37
				38	.. ctype:: PyUnicodeObject
				39
				40	This subtype of :ctype:`PyObject` represents a Python Unicode object.
				41
				42
				43	.. cvar:: PyTypeObject PyUnicode_Type
				44
				45	This instance of :ctype:`PyTypeObject` represents the Python Unicode type. It
				46	is exposed to Python code as ``unicode`` and ``types.UnicodeType``.
				47
				48	The following APIs are really C macros and can be used to do fast checks and to
				49	access internal read-only data of Unicode objects:
				50
				51
				52	.. cfunction:: int PyUnicode_Check(PyObject *o)
				53
				54	Return true if the object o is a Unicode object or an instance of a Unicode
				55	subtype.
				56
				57	.. versionchanged:: 2.2
				58	Allowed subtypes to be accepted.
				59
				60
				61	.. cfunction:: int PyUnicode_CheckExact(PyObject *o)
				62
				63	Return true if the object o is a Unicode object, but not an instance of a
				64	subtype.
				65
				66	.. versionadded:: 2.2
				67
				68
				69	.. cfunction:: Py_ssize_t PyUnicode_GET_SIZE(PyObject *o)
				70
				71	Return the size of the object. o has to be a :ctype:`PyUnicodeObject` (not
				72	checked).
				73
				74
				75	.. cfunction:: Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o)
				76
				77	Return the size of the object's internal buffer in bytes. o has to be a
				78	:ctype:`PyUnicodeObject` (not checked).
				79
				80
				81	.. cfunction:: Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o)
				82
				83	Return a pointer to the internal :ctype:`Py_UNICODE` buffer of the object. o
				84	has to be a :ctype:`PyUnicodeObject` (not checked).
				85
				86
				87	.. cfunction:: const char* PyUnicode_AS_DATA(PyObject *o)
				88
				89	Return a pointer to the internal buffer of the object. o has to be a
				90	:ctype:`PyUnicodeObject` (not checked).
				91
				92	Unicode provides many different character properties. The most often needed ones
				93	are available through these macros which are mapped to C functions depending on
				94	the Python configuration.
				95
				96	.. % --- Unicode character properties ---------------------------------------
				97
				98
				99	.. cfunction:: int Py_UNICODE_ISSPACE(Py_UNICODE ch)
				100
				101	Return 1 or 0 depending on whether ch is a whitespace character.
				102
				103
				104	.. cfunction:: int Py_UNICODE_ISLOWER(Py_UNICODE ch)
				105
				106	Return 1 or 0 depending on whether ch is a lowercase character.
				107
				108
				109	.. cfunction:: int Py_UNICODE_ISUPPER(Py_UNICODE ch)
				110
				111	Return 1 or 0 depending on whether ch is an uppercase character.
				112
				113
				114	.. cfunction:: int Py_UNICODE_ISTITLE(Py_UNICODE ch)
				115
				116	Return 1 or 0 depending on whether ch is a titlecase character.
				117
				118
				119	.. cfunction:: int Py_UNICODE_ISLINEBREAK(Py_UNICODE ch)
				120
				121	Return 1 or 0 depending on whether ch is a linebreak character.
				122
				123
				124	.. cfunction:: int Py_UNICODE_ISDECIMAL(Py_UNICODE ch)
				125
				126	Return 1 or 0 depending on whether ch is a decimal character.
				127
				128
				129	.. cfunction:: int Py_UNICODE_ISDIGIT(Py_UNICODE ch)
				130
				131	Return 1 or 0 depending on whether ch is a digit character.
				132
				133
				134	.. cfunction:: int Py_UNICODE_ISNUMERIC(Py_UNICODE ch)
				135
				136	Return 1 or 0 depending on whether ch is a numeric character.
				137
				138
				139	.. cfunction:: int Py_UNICODE_ISALPHA(Py_UNICODE ch)
				140
				141	Return 1 or 0 depending on whether ch is an alphabetic character.
				142
				143
				144	.. cfunction:: int Py_UNICODE_ISALNUM(Py_UNICODE ch)
				145
				146	Return 1 or 0 depending on whether ch is an alphanumeric character.
				147
				148	These APIs can be used for fast direct character conversions:
				149
				150
				151	.. cfunction:: Py_UNICODE Py_UNICODE_TOLOWER(Py_UNICODE ch)
				152
				153	Return the character ch converted to lower case.
				154
				155
				156	.. cfunction:: Py_UNICODE Py_UNICODE_TOUPPER(Py_UNICODE ch)
				157
				158	Return the character ch converted to upper case.
				159
				160
				161	.. cfunction:: Py_UNICODE Py_UNICODE_TOTITLE(Py_UNICODE ch)
				162
				163	Return the character ch converted to title case.
				164
				165
				166	.. cfunction:: int Py_UNICODE_TODECIMAL(Py_UNICODE ch)
				167
				168	Return the character ch converted to a decimal positive integer. Return
				169	``-1`` if this is not possible. This macro does not raise exceptions.
				170
				171
				172	.. cfunction:: int Py_UNICODE_TODIGIT(Py_UNICODE ch)
				173
				174	Return the character ch converted to a single digit integer. Return ``-1`` if
				175	this is not possible. This macro does not raise exceptions.
				176
				177
				178	.. cfunction:: double Py_UNICODE_TONUMERIC(Py_UNICODE ch)
				179
				180	Return the character ch converted to a double. Return ``-1.0`` if this is not
				181	possible. This macro does not raise exceptions.
				182
				183	To create Unicode objects and access their basic sequence properties, use these
				184	APIs:
				185
				186	.. % --- Plain Py_UNICODE ---------------------------------------------------
				187
				188
				189	.. cfunction:: PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size)
				190
				191	Create a Unicode Object from the Py_UNICODE buffer u of the given size. u
				192	may be NULL which causes the contents to be undefined. It is the user's
				193	responsibility to fill in the needed data. The buffer is copied into the new
				194	object. If the buffer is not NULL, the return value might be a shared object.
				195	Therefore, modification of the resulting Unicode object is only allowed when u
				196	is NULL.
				197
				198
				199	.. cfunction:: Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode)
				200
				201	Return a read-only pointer to the Unicode object's internal :ctype:`Py_UNICODE`
				202	buffer, NULL if unicode is not a Unicode object.
				203
				204
				205	.. cfunction:: Py_ssize_t PyUnicode_GetSize(PyObject *unicode)
				206
				207	Return the length of the Unicode object.
				208
				209
				210	.. cfunction:: PyObject* PyUnicode_FromEncodedObject(PyObject obj, const char encoding, const char *errors)
				211
				212	Coerce an encoded object obj to an Unicode object and return a reference with
				213	incremented refcount.
				214
				215	String and other char buffer compatible objects are decoded according to the
				216	given encoding and using the error handling defined by errors. Both can be
				217	NULL to have the interface use the default values (see the next section for
				218	details).
				219
				220	All other objects, including Unicode objects, cause a :exc:`TypeError` to be
				221	set.
				222
				223	The API returns NULL if there was an error. The caller is responsible for
				224	decref'ing the returned objects.
				225
				226
				227	.. cfunction:: PyObject* PyUnicode_FromObject(PyObject *obj)
				228
				229	Shortcut for ``PyUnicode_FromEncodedObject(obj, NULL, "strict")`` which is used
				230	throughout the interpreter whenever coercion to Unicode is needed.
				231
				232	If the platform supports :ctype:`wchar_t` and provides a header file wchar.h,
				233	Python can interface directly to this type using the following functions.
				234	Support is optimized if Python's own :ctype:`Py_UNICODE` type is identical to
				235	the system's :ctype:`wchar_t`.
				236
				237	.. % --- wchar_t support for platforms which support it ---------------------
				238
				239
				240	.. cfunction:: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
				241
				242	Create a Unicode object from the :ctype:`wchar_t` buffer w of the given size.
				243	Return NULL on failure.
				244
				245
				246	.. cfunction:: Py_ssize_t PyUnicode_AsWideChar(PyUnicodeObject unicode, wchar_t w, Py_ssize_t size)
				247
				248	Copy the Unicode object contents into the :ctype:`wchar_t` buffer w. At most
				249	size :ctype:`wchar_t` characters are copied (excluding a possibly trailing
				250	0-termination character). Return the number of :ctype:`wchar_t` characters
				251	copied or -1 in case of an error. Note that the resulting :ctype:`wchar_t`
				252	string may or may not be 0-terminated. It is the responsibility of the caller
				253	to make sure that the :ctype:`wchar_t` string is 0-terminated in case this is
				254	required by the application.
				255
				256
				257	.. _builtincodecs:
				258
				259	Built-in Codecs
				260	^^^^^^^^^^^^^^^
				261
				262	Python provides a set of builtin codecs which are written in C for speed. All of
				263	these codecs are directly usable via the following functions.
				264
				265	Many of the following APIs take two arguments encoding and errors. These
				266	parameters encoding and errors have the same semantics as the ones of the
				267	builtin unicode() Unicode object constructor.
				268
				269	Setting encoding to NULL causes the default encoding to be used which is
				270	ASCII. The file system calls should use :cdata:`Py_FileSystemDefaultEncoding`
				271	as the encoding for file names. This variable should be treated as read-only: On
				272	some systems, it will be a pointer to a static string, on others, it will change
				273	at run-time (such as when the application invokes setlocale).
				274
				275	Error handling is set by errors which may also be set to NULL meaning to use
				276	the default handling defined for the codec. Default error handling for all
				277	builtin codecs is "strict" (:exc:`ValueError` is raised).
				278
				279	The codecs all use a similar interface. Only deviation from the following
				280	generic ones are documented for simplicity.
				281
				282	These are the generic codec APIs:
				283
				284	.. % --- Generic Codecs -----------------------------------------------------
				285
				286
				287	.. cfunction:: PyObject* PyUnicode_Decode(const char s, Py_ssize_t size, const char encoding, const char *errors)
				288
				289	Create a Unicode object by decoding size bytes of the encoded string s.
				290	encoding and errors have the same meaning as the parameters of the same name
				291	in the :func:`unicode` builtin function. The codec to be used is looked up
				292	using the Python codec registry. Return NULL if an exception was raised by
				293	the codec.
				294
				295
				296	.. cfunction:: PyObject* PyUnicode_Encode(const Py_UNICODE s, Py_ssize_t size, const char encoding, const char *errors)
				297
				298	Encode the :ctype:`Py_UNICODE` buffer of the given size and return a Python
				299	string object. encoding and errors have the same meaning as the parameters
				300	of the same name in the Unicode :meth:`encode` method. The codec to be used is
				301	looked up using the Python codec registry. Return NULL if an exception was
				302	raised by the codec.
				303
				304
				305	.. cfunction:: PyObject* PyUnicode_AsEncodedString(PyObject unicode, const char encoding, const char *errors)
				306
				307	Encode a Unicode object and return the result as Python string object.
				308	encoding and errors have the same meaning as the parameters of the same name
				309	in the Unicode :meth:`encode` method. The codec to be used is looked up using
				310	the Python codec registry. Return NULL if an exception was raised by the
				311	codec.
				312
				313	These are the UTF-8 codec APIs:
				314
				315	.. % --- UTF-8 Codecs -------------------------------------------------------
				316
				317
				318	.. cfunction:: PyObject* PyUnicode_DecodeUTF8(const char s, Py_ssize_t size, const char errors)
				319
				320	Create a Unicode object by decoding size bytes of the UTF-8 encoded string
				321	s. Return NULL if an exception was raised by the codec.
				322
				323
				324	.. cfunction:: PyObject* PyUnicode_DecodeUTF8Stateful(const char s, Py_ssize_t size, const char errors, Py_ssize_t *consumed)
				325
				326	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF8`. If
				327	consumed is not NULL, trailing incomplete UTF-8 byte sequences will not be
				328	treated as an error. Those bytes will not be decoded and the number of bytes
				329	that have been decoded will be stored in consumed.
				330
				331	.. versionadded:: 2.4
				332
				333
				334	.. cfunction:: PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE s, Py_ssize_t size, const char errors)
				335
				336	Encode the :ctype:`Py_UNICODE` buffer of the given size using UTF-8 and return a
				337	Python string object. Return NULL if an exception was raised by the codec.
				338
				339
				340	.. cfunction:: PyObject* PyUnicode_AsUTF8String(PyObject *unicode)
				341
				342	Encode a Unicode object using UTF-8 and return the result as Python string
				343	object. Error handling is "strict". Return NULL if an exception was raised
				344	by the codec.
				345
				346	These are the UTF-32 codec APIs:
				347
				348	.. % --- UTF-32 Codecs ------------------------------------------------------ */
				349
				350
				351	.. cfunction:: PyObject* PyUnicode_DecodeUTF32(const char s, Py_ssize_t size, const char errors, int *byteorder)
				352
				353	Decode length bytes from a UTF-32 encoded buffer string and return the
				354	corresponding Unicode object. errors (if non-NULL) defines the error
				355	handling. It defaults to "strict".
				356
				357	If byteorder is non-NULL, the decoder starts decoding using the given byte
				358	order::
				359
				360	*byteorder == -1: little endian
				361	*byteorder == 0: native order
				362	*byteorder == 1: big endian
				363
				364	and then switches if the first four bytes of the input data are a byte order mark
				365	(BOM) and the specified byte order is native order. This BOM is not copied into
				366	the resulting Unicode string. After completion, \byteorder* is set to the
				367	current byte order at the end of input data.
				368
				369	In a narrow build codepoints outside the BMP will be decoded as surrogate pairs.
				370
				371	If byteorder is NULL, the codec starts in native order mode.
				372
				373	Return NULL if an exception was raised by the codec.
				374
				375	.. versionadded:: 2.6
				376
				377
				378	.. cfunction:: PyObject* PyUnicode_DecodeUTF32Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				379
				380	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF32`. If
				381	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF32Stateful` will not treat
				382	trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible
				383	by four) as an error. Those bytes will not be decoded and the number of bytes
				384	that have been decoded will be stored in consumed.
				385
				386	.. versionadded:: 2.6
				387
				388
				389	.. cfunction:: PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				390
				391	Return a Python bytes object holding the UTF-32 encoded value of the Unicode
				392	data in s. If byteorder is not ``0``, output is written according to the
				393	following byte order::
				394
				395	byteorder == -1: little endian
				396	byteorder == 0: native byte order (writes a BOM mark)
				397	byteorder == 1: big endian
				398
				399	If byteorder is ``0``, the output string will always start with the Unicode BOM
				400	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				401
				402	If Py_UNICODE_WIDE is not defined, surrogate pairs will be output
				403	as a single codepoint.
				404
				405	Return NULL if an exception was raised by the codec.
				406
				407	.. versionadded:: 2.6
				408
				409
				410	.. cfunction:: PyObject* PyUnicode_AsUTF32String(PyObject *unicode)
				411
				412	Return a Python string using the UTF-32 encoding in native byte order. The
				413	string always starts with a BOM mark. Error handling is "strict". Return
				414	NULL if an exception was raised by the codec.
				415
				416	.. versionadded:: 2.6
				417
				418
				419	These are the UTF-16 codec APIs:
				420
				421	.. % --- UTF-16 Codecs ------------------------------------------------------ */
				422
				423
				424	.. cfunction:: PyObject* PyUnicode_DecodeUTF16(const char s, Py_ssize_t size, const char errors, int *byteorder)
				425
				426	Decode length bytes from a UTF-16 encoded buffer string and return the
				427	corresponding Unicode object. errors (if non-NULL) defines the error
				428	handling. It defaults to "strict".
				429
				430	If byteorder is non-NULL, the decoder starts decoding using the given byte
				431	order::
				432
				433	*byteorder == -1: little endian
				434	*byteorder == 0: native order
				435	*byteorder == 1: big endian
				436
				437	and then switches if the first two bytes of the input data are a byte order mark
				438	(BOM) and the specified byte order is native order. This BOM is not copied into
				439	the resulting Unicode string. After completion, \byteorder* is set to the
				440	current byte order at the.
				441
				442	If byteorder is NULL, the codec starts in native order mode.
				443
				444	Return NULL if an exception was raised by the codec.
				445
				446
				447	.. cfunction:: PyObject* PyUnicode_DecodeUTF16Stateful(const char s, Py_ssize_t size, const char errors, int byteorder, Py_ssize_t consumed)
				448
				449	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeUTF16`. If
				450	consumed is not NULL, :cfunc:`PyUnicode_DecodeUTF16Stateful` will not treat
				451	trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a
				452	split surrogate pair) as an error. Those bytes will not be decoded and the
				453	number of bytes that have been decoded will be stored in consumed.
				454
				455	.. versionadded:: 2.4
				456
				457
				458	.. cfunction:: PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE s, Py_ssize_t size, const char errors, int byteorder)
				459
				460	Return a Python string object holding the UTF-16 encoded value of the Unicode
				461	data in s. If byteorder is not ``0``, output is written according to the
				462	following byte order::
				463
				464	byteorder == -1: little endian
				465	byteorder == 0: native byte order (writes a BOM mark)
				466	byteorder == 1: big endian
				467
				468	If byteorder is ``0``, the output string will always start with the Unicode BOM
				469	mark (U+FEFF). In the other two modes, no BOM mark is prepended.
				470
				471	If Py_UNICODE_WIDE is defined, a single :ctype:`Py_UNICODE` value may get
				472	represented as a surrogate pair. If it is not defined, each :ctype:`Py_UNICODE`
				473	values is interpreted as an UCS-2 character.
				474
				475	Return NULL if an exception was raised by the codec.
				476
				477
				478	.. cfunction:: PyObject* PyUnicode_AsUTF16String(PyObject *unicode)
				479
				480	Return a Python string using the UTF-16 encoding in native byte order. The
				481	string always starts with a BOM mark. Error handling is "strict". Return
				482	NULL if an exception was raised by the codec.
				483
				484	These are the "Unicode Escape" codec APIs:
				485
				486	.. % --- Unicode-Escape Codecs ----------------------------------------------
				487
				488
				489	.. cfunction:: PyObject* PyUnicode_DecodeUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				490
				491	Create a Unicode object by decoding size bytes of the Unicode-Escape encoded
				492	string s. Return NULL if an exception was raised by the codec.
				493
				494
				495	.. cfunction:: PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size)
				496
				497	Encode the :ctype:`Py_UNICODE` buffer of the given size using Unicode-Escape and
				498	return a Python string object. Return NULL if an exception was raised by the
				499	codec.
				500
				501
				502	.. cfunction:: PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode)
				503
				504	Encode a Unicode object using Unicode-Escape and return the result as Python
				505	string object. Error handling is "strict". Return NULL if an exception was
				506	raised by the codec.
				507
				508	These are the "Raw Unicode Escape" codec APIs:
				509
				510	.. % --- Raw-Unicode-Escape Codecs ------------------------------------------
				511
				512
				513	.. cfunction:: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char s, Py_ssize_t size, const char errors)
				514
				515	Create a Unicode object by decoding size bytes of the Raw-Unicode-Escape
				516	encoded string s. Return NULL if an exception was raised by the codec.
				517
				518
				519	.. cfunction:: PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE s, Py_ssize_t size, const char errors)
				520
				521	Encode the :ctype:`Py_UNICODE` buffer of the given size using Raw-Unicode-Escape
				522	and return a Python string object. Return NULL if an exception was raised by
				523	the codec.
				524
				525
				526	.. cfunction:: PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode)
				527
				528	Encode a Unicode object using Raw-Unicode-Escape and return the result as
				529	Python string object. Error handling is "strict". Return NULL if an exception
				530	was raised by the codec.
				531
				532	These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode
				533	ordinals and only these are accepted by the codecs during encoding.
				534
				535	.. % --- Latin-1 Codecs -----------------------------------------------------
				536
				537
				538	.. cfunction:: PyObject* PyUnicode_DecodeLatin1(const char s, Py_ssize_t size, const char errors)
				539
				540	Create a Unicode object by decoding size bytes of the Latin-1 encoded string
				541	s. Return NULL if an exception was raised by the codec.
				542
				543
				544	.. cfunction:: PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE s, Py_ssize_t size, const char errors)
				545
				546	Encode the :ctype:`Py_UNICODE` buffer of the given size using Latin-1 and return
				547	a Python string object. Return NULL if an exception was raised by the codec.
				548
				549
				550	.. cfunction:: PyObject* PyUnicode_AsLatin1String(PyObject *unicode)
				551
				552	Encode a Unicode object using Latin-1 and return the result as Python string
				553	object. Error handling is "strict". Return NULL if an exception was raised
				554	by the codec.
				555
				556	These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other
				557	codes generate errors.
				558
				559	.. % --- ASCII Codecs -------------------------------------------------------
				560
				561
				562	.. cfunction:: PyObject* PyUnicode_DecodeASCII(const char s, Py_ssize_t size, const char errors)
				563
				564	Create a Unicode object by decoding size bytes of the ASCII encoded string
				565	s. Return NULL if an exception was raised by the codec.
				566
				567
				568	.. cfunction:: PyObject* PyUnicode_EncodeASCII(const Py_UNICODE s, Py_ssize_t size, const char errors)
				569
				570	Encode the :ctype:`Py_UNICODE` buffer of the given size using ASCII and return a
				571	Python string object. Return NULL if an exception was raised by the codec.
				572
				573
				574	.. cfunction:: PyObject* PyUnicode_AsASCIIString(PyObject *unicode)
				575
				576	Encode a Unicode object using ASCII and return the result as Python string
				577	object. Error handling is "strict". Return NULL if an exception was raised
				578	by the codec.
				579
				580	These are the mapping codec APIs:
				581
				582	.. % --- Character Map Codecs -----------------------------------------------
				583
				584	This codec is special in that it can be used to implement many different codecs
				585	(and this is in fact what was done to obtain most of the standard codecs
				586	included in the :mod:`encodings` package). The codec uses mapping to encode and
				587	decode characters.
				588
				589	Decoding mappings must map single string characters to single Unicode
				590	characters, integers (which are then interpreted as Unicode ordinals) or None
				591	(meaning "undefined mapping" and causing an error).
				592
				593	Encoding mappings must map single Unicode characters to single string
				594	characters, integers (which are then interpreted as Latin-1 ordinals) or None
				595	(meaning "undefined mapping" and causing an error).
				596
				597	The mapping objects provided must only support the __getitem__ mapping
				598	interface.
				599
				600	If a character lookup fails with a LookupError, the character is copied as-is
				601	meaning that its ordinal value will be interpreted as Unicode or Latin-1 ordinal
				602	resp. Because of this, mappings only need to contain those mappings which map
				603	characters to different code points.
				604
				605
				606	.. cfunction:: PyObject* PyUnicode_DecodeCharmap(const char s, Py_ssize_t size, PyObject mapping, const char *errors)
				607
				608	Create a Unicode object by decoding size bytes of the encoded string s using
				609	the given mapping object. Return NULL if an exception was raised by the
				610	codec. If mapping is NULL latin-1 decoding will be done. Else it can be a
				611	dictionary mapping byte or a unicode string, which is treated as a lookup table.
				612	Byte values greater that the length of the string and U+FFFE "characters" are
				613	treated as "undefined mapping".
				614
				615	.. versionchanged:: 2.4
				616	Allowed unicode string as mapping argument.
				617
				618
				619	.. cfunction:: PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject mapping, const char *errors)
				620
				621	Encode the :ctype:`Py_UNICODE` buffer of the given size using the given
				622	mapping object and return a Python string object. Return NULL if an
				623	exception was raised by the codec.
				624
				625
				626	.. cfunction:: PyObject* PyUnicode_AsCharmapString(PyObject unicode, PyObject mapping)
				627
				628	Encode a Unicode object using the given mapping object and return the result
				629	as Python string object. Error handling is "strict". Return NULL if an
				630	exception was raised by the codec.
				631
				632	The following codec API is special in that maps Unicode to Unicode.
				633
				634
				635	.. cfunction:: PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE s, Py_ssize_t size, PyObject table, const char *errors)
				636
				637	Translate a :ctype:`Py_UNICODE` buffer of the given length by applying a
				638	character mapping table to it and return the resulting Unicode object. Return
				639	NULL when an exception was raised by the codec.
				640
				641	The mapping table must map Unicode ordinal integers to Unicode ordinal
				642	integers or None (causing deletion of the character).
				643
				644	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				645	and sequences work well. Unmapped character ordinals (ones which cause a
				646	:exc:`LookupError`) are left untouched and are copied as-is.
				647
				648	These are the MBCS codec APIs. They are currently only available on Windows and
				649	use the Win32 MBCS converters to implement the conversions. Note that MBCS (or
				650	DBCS) is a class of encodings, not just one. The target encoding is defined by
				651	the user settings on the machine running the codec.
				652
				653	.. % --- MBCS codecs for Windows --------------------------------------------
				654
				655
				656	.. cfunction:: PyObject* PyUnicode_DecodeMBCS(const char s, Py_ssize_t size, const char errors)
				657
				658	Create a Unicode object by decoding size bytes of the MBCS encoded string s.
				659	Return NULL if an exception was raised by the codec.
				660
				661
				662	.. cfunction:: PyObject* PyUnicode_DecodeMBCSStateful(const char s, int size, const char errors, int *consumed)
				663
				664	If consumed is NULL, behave like :cfunc:`PyUnicode_DecodeMBCS`. If
				665	consumed is not NULL, :cfunc:`PyUnicode_DecodeMBCSStateful` will not decode
				666	trailing lead byte and the number of bytes that have been decoded will be stored
				667	in consumed.
				668
				669	.. versionadded:: 2.5
				670
				671
				672	.. cfunction:: PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE s, Py_ssize_t size, const char errors)
				673
				674	Encode the :ctype:`Py_UNICODE` buffer of the given size using MBCS and return a
				675	Python string object. Return NULL if an exception was raised by the codec.
				676
				677
				678	.. cfunction:: PyObject* PyUnicode_AsMBCSString(PyObject *unicode)
				679
				680	Encode a Unicode object using MBCS and return the result as Python string
				681	object. Error handling is "strict". Return NULL if an exception was raised
				682	by the codec.
				683
				684	.. % --- Methods & Slots ----------------------------------------------------
				685
				686
				687	.. _unicodemethodsandslots:
				688
				689	Methods and Slot Functions
				690	^^^^^^^^^^^^^^^^^^^^^^^^^^
				691
				692	The following APIs are capable of handling Unicode objects and strings on input
				693	(we refer to them as strings in the descriptions) and return Unicode objects or
				694	integers as appropriate.
				695
				696	They all return NULL or ``-1`` if an exception occurs.
				697
				698
				699	.. cfunction:: PyObject* PyUnicode_Concat(PyObject left, PyObject right)
				700
				701	Concat two strings giving a new Unicode string.
				702
				703
				704	.. cfunction:: PyObject* PyUnicode_Split(PyObject s, PyObject sep, Py_ssize_t maxsplit)
				705
				706	Split a string giving a list of Unicode strings. If sep is NULL, splitting
				707	will be done at all whitespace substrings. Otherwise, splits occur at the given
				708	separator. At most maxsplit splits will be done. If negative, no limit is
				709	set. Separators are not included in the resulting list.
				710
				711
				712	.. cfunction:: PyObject* PyUnicode_Splitlines(PyObject *s, int keepend)
				713
				714	Split a Unicode string at line breaks, returning a list of Unicode strings.
				715	CRLF is considered to be one line break. If keepend is 0, the Line break
				716	characters are not included in the resulting strings.
				717
				718
				719	.. cfunction:: PyObject* PyUnicode_Translate(PyObject str, PyObject table, const char *errors)
				720
				721	Translate a string by applying a character mapping table to it and return the
				722	resulting Unicode object.
				723
				724	The mapping table must map Unicode ordinal integers to Unicode ordinal integers
				725	or None (causing deletion of the character).
				726
				727	Mapping tables need only provide the :meth:`__getitem__` interface; dictionaries
				728	and sequences work well. Unmapped character ordinals (ones which cause a
				729	:exc:`LookupError`) are left untouched and are copied as-is.
				730
				731	errors has the usual meaning for codecs. It may be NULL which indicates to
				732	use the default error handling.
				733
				734
				735	.. cfunction:: PyObject* PyUnicode_Join(PyObject separator, PyObject seq)
				736
				737	Join a sequence of strings using the given separator and return the resulting
				738	Unicode string.
				739
				740
				741	.. cfunction:: int PyUnicode_Tailmatch(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				742
				743	Return 1 if substr matches str[start:end] at the given tail end
				744	(direction == -1 means to do a prefix match, direction == 1 a suffix match),
				745	0 otherwise. Return ``-1`` if an error occurred.
				746
				747
				748	.. cfunction:: Py_ssize_t PyUnicode_Find(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end, int direction)
				749
				750	Return the first position of substr in str[start:end] using the given
				751	direction (direction == 1 means to do a forward search, direction == -1 a
				752	backward search). The return value is the index of the first match; a value of
				753	``-1`` indicates that no match was found, and ``-2`` indicates that an error
				754	occurred and an exception has been set.
				755
				756
				757	.. cfunction:: Py_ssize_t PyUnicode_Count(PyObject str, PyObject substr, Py_ssize_t start, Py_ssize_t end)
				758
				759	Return the number of non-overlapping occurrences of substr in
				760	``str[start:end]``. Return ``-1`` if an error occurred.
				761
				762
				763	.. cfunction:: PyObject* PyUnicode_Replace(PyObject str, PyObject substr, PyObject *replstr, Py_ssize_t maxcount)
				764
				765	Replace at most maxcount occurrences of substr in str with replstr and
				766	return the resulting Unicode object. maxcount == -1 means replace all
				767	occurrences.
				768
				769
				770	.. cfunction:: int PyUnicode_Compare(PyObject left, PyObject right)
				771
				772	Compare two strings and return -1, 0, 1 for less than, equal, and greater than,
				773	respectively.
				774
				775
				776	.. cfunction:: int PyUnicode_RichCompare(PyObject left, PyObject right, int op)
				777
				778	Rich compare two unicode strings and return one of the following:
				779
				780	* ``NULL`` in case an exception was raised
				781	* :const:`Py_True` or :const:`Py_False` for successful comparisons
				782	* :const:`Py_NotImplemented` in case the type combination is unknown
				783
				784	Note that :const:`Py_EQ` and :const:`Py_NE` comparisons can cause a
				785	:exc:`UnicodeWarning` in case the conversion of the arguments to Unicode fails
				786	with a :exc:`UnicodeDecodeError`.
				787
				788	Possible values for op are :const:`Py_GT`, :const:`Py_GE`, :const:`Py_EQ`,
				789	:const:`Py_NE`, :const:`Py_LT`, and :const:`Py_LE`.
				790
				791
				792	.. cfunction:: PyObject* PyUnicode_Format(PyObject format, PyObject args)
				793
				794	Return a new string object from format and args; this is analogous to
				795	``format % args``. The args argument must be a tuple.
				796
				797
				798	.. cfunction:: int PyUnicode_Contains(PyObject container, PyObject element)
				799
				800	Check whether element is contained in container and return true or false
				801	accordingly.
				802
				803	element has to coerce to a one element Unicode string. ``-1`` is returned if
				804	there was an error.