Blame - Doc/lib/libcodecs.tex - platform/external/python/cpython3

blob: cc4992f925ced1b877e5eb4cb39dd52cd056b51d [file] [log] [blame]

Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	1	\section{\module{codecs} ---
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	2	Codec registry and base classes}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	3
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	4	\declaremodule{standard}{codecs}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	5	\modulesynopsis{Encode and decode data and streams.}
				6	\moduleauthor{Marc-Andre Lemburg}{mal@lemburg.com}
				7	\sectionauthor{Marc-Andre Lemburg}{mal@lemburg.com}
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	8	\sectionauthor{Martin v. L\"owis}{martin@v.loewis.de}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	9
				10	\index{Unicode}
				11	\index{Codecs}
				12	\indexii{Codecs}{encode}
				13	\indexii{Codecs}{decode}
				14	\index{streams}
				15	\indexii{stackable}{streams}
				16
				17
				18	This module defines base classes for standard Python codecs (encoders
				19	and decoders) and provides access to the internal Python codec
Walter Dörwald	3aeb632	2002-09-02 13:14:32 +0000	[diff] [blame]	20	registry which manages the codec and error handling lookup process.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	21
				22	It defines the following functions:
				23
				24	\begin{funcdesc}{register}{search_function}
				25	Register a codec search function. Search functions are expected to
				26	take one argument, the encoding name in all lower case letters, and
				27	return a tuple of functions \code{(\var{encoder}, \var{decoder}, \var{stream_reader},
				28	\var{stream_writer})} taking the following arguments:
				29
				30	\var{encoder} and \var{decoder}: These must be functions or methods
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	31	which have the same interface as the
				32	\method{encode()}/\method{decode()} methods of Codec instances (see
				33	Codec Interface). The functions/methods are expected to work in a
				34	stateless mode.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	35
				36	\var{stream_reader} and \var{stream_writer}: These have to be
				37	factory functions providing the following interface:
				38
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	39	\code{factory(\var{stream}, \var{errors}='strict')}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	40
				41	The factory functions must return objects providing the interfaces
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	42	defined by the base classes \class{StreamWriter} and
				43	\class{StreamReader}, respectively. Stream codecs can maintain
				44	state.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	45
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	46	Possible values for errors are \code{'strict'} (raise an exception
				47	in case of an encoding error), \code{'replace'} (replace malformed
Walter Dörwald	72f8616	2002-11-19 21:51:35 +0000	[diff] [blame]	48	data with a suitable replacement marker, such as \character{?}),
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	49	\code{'ignore'} (ignore malformed data and continue without further
Walter Dörwald	72f8616	2002-11-19 21:51:35 +0000	[diff] [blame]	50	notice), \code{'xmlcharrefreplace'} (replace with the appropriate XML
				51	character reference (for encoding only)) and \code{'backslashreplace'}
				52	(replace with backslashed escape sequences (for encoding only)) as
				53	well as any other error handling name defined via
				54	\function{register_error()}.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	55
				56	In case a search function cannot find a given encoding, it should
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	57	return \code{None}.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	58	\end{funcdesc}
				59
				60	\begin{funcdesc}{lookup}{encoding}
				61	Looks up a codec tuple in the Python codec registry and returns the
				62	function tuple as defined above.
				63
				64	Encodings are first looked up in the registry's cache. If not found,
				65	the list of registered search functions is scanned. If no codecs tuple
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	66	is found, a \exception{LookupError} is raised. Otherwise, the codecs
				67	tuple is stored in the cache and returned to the caller.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	68	\end{funcdesc}
				69
Skip Montanaro	b02ea65	2002-04-17 19:33:06 +0000	[diff] [blame]	70	To simplify access to the various codecs, the module provides these
Marc-André Lemburg	494f2ae	2001-09-19 11:33:31 +0000	[diff] [blame]	71	additional functions which use \function{lookup()} for the codec
				72	lookup:
				73
				74	\begin{funcdesc}{getencoder}{encoding}
				75	Lookup up the codec for the given encoding and return its encoder
				76	function.
				77
				78	Raises a \exception{LookupError} in case the encoding cannot be found.
				79	\end{funcdesc}
				80
				81	\begin{funcdesc}{getdecoder}{encoding}
				82	Lookup up the codec for the given encoding and return its decoder
				83	function.
				84
				85	Raises a \exception{LookupError} in case the encoding cannot be found.
				86	\end{funcdesc}
				87
				88	\begin{funcdesc}{getreader}{encoding}
				89	Lookup up the codec for the given encoding and return its StreamReader
				90	class or factory function.
				91
				92	Raises a \exception{LookupError} in case the encoding cannot be found.
				93	\end{funcdesc}
				94
				95	\begin{funcdesc}{getwriter}{encoding}
				96	Lookup up the codec for the given encoding and return its StreamWriter
				97	class or factory function.
				98
				99	Raises a \exception{LookupError} in case the encoding cannot be found.
				100	\end{funcdesc}
				101
Walter Dörwald	3aeb632	2002-09-02 13:14:32 +0000	[diff] [blame]	102	\begin{funcdesc}{register_error}{name, error_handler}
				103	Register the error handling function \var{error_handler} under the
Raymond Hettinger	8a64d40	2002-09-08 22:26:13 +0000	[diff] [blame]	104	name \var{name}. \var{error_handler} will be called during encoding
Walter Dörwald	3aeb632	2002-09-02 13:14:32 +0000	[diff] [blame]	105	and decoding in case of an error, when \var{name} is specified as the
Walter Dörwald	2e0b18a	2003-01-31 17:19:08 +0000	[diff] [blame]	106	errors parameter.
				107
				108	For encoding \var{error_handler} will be called with a
				109	\exception{UnicodeEncodeError} instance, which contains information about
				110	the location of the error. The error handler must either raise this or
				111	a different exception or return a tuple with a replacement for the
				112	unencodable part of the input and a position where encoding should
				113	continue. The encoder will encode the replacement and continue encoding
				114	the original input at the specified position. Negative position values
				115	will be treated as being relative to the end of the input string. If the
				116	resulting position is out of bound an IndexError will be raised.
				117
				118	Decoding and translating works similar, except \exception{UnicodeDecodeError}
				119	or \exception{UnicodeTranslateError} will be passed to the handler and
				120	that the replacement from the error handler will be put into the output
				121	directly.
Walter Dörwald	3aeb632	2002-09-02 13:14:32 +0000	[diff] [blame]	122	\end{funcdesc}
				123
				124	\begin{funcdesc}{lookup_error}{name}
				125	Return the error handler previously register under the name \var{name}.
				126
				127	Raises a \exception{LookupError} in case the handler cannot be found.
				128	\end{funcdesc}
				129
				130	\begin{funcdesc}{strict_errors}{exception}
				131	Implements the \code{strict} error handling.
				132	\end{funcdesc}
				133
				134	\begin{funcdesc}{replace_errors}{exception}
				135	Implements the \code{replace} error handling.
				136	\end{funcdesc}
				137
				138	\begin{funcdesc}{ignore_errors}{exception}
				139	Implements the \code{ignore} error handling.
				140	\end{funcdesc}
				141
				142	\begin{funcdesc}{xmlcharrefreplace_errors_errors}{exception}
				143	Implements the \code{xmlcharrefreplace} error handling.
				144	\end{funcdesc}
				145
				146	\begin{funcdesc}{backslashreplace_errors_errors}{exception}
				147	Implements the \code{backslashreplace} error handling.
				148	\end{funcdesc}
				149
Walter Dörwald	1a7a894	2002-11-02 13:32:07 +0000	[diff] [blame]	150	To simplify working with encoded files or stream, the module
				151	also defines these utility functions:
				152
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	153	\begin{funcdesc}{open}{filename, mode\optional{, encoding\optional{,
				154	errors\optional{, buffering}}}}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	155	Open an encoded file using the given \var{mode} and return
				156	a wrapped version providing transparent encoding/decoding.
				157
Fred Drake	0aa811c	2001-10-20 04:24:09 +0000	[diff] [blame]	158	\note{The wrapped version will only accept the object format
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	159	defined by the codecs, i.e.\ Unicode objects for most built-in
				160	codecs. Output is also codec-dependent and will usually be Unicode as
Fred Drake	0aa811c	2001-10-20 04:24:09 +0000	[diff] [blame]	161	well.}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	162
				163	\var{encoding} specifies the encoding which is to be used for the
Raymond Hettinger	7e43110	2003-09-22 15:00:55 +0000	[diff] [blame]	164	file.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	165
				166	\var{errors} may be given to define the error handling. It defaults
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	167	to \code{'strict'} which causes a \exception{ValueError} to be raised
				168	in case an encoding error occurs.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	169
Fred Drake	69ca950	2000-04-06 16:09:59 +0000	[diff] [blame]	170	\var{buffering} has the same meaning as for the built-in
				171	\function{open()} function. It defaults to line buffered.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	172	\end{funcdesc}
				173
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	174	\begin{funcdesc}{EncodedFile}{file, input\optional{,
				175	output\optional{, errors}}}
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	176	Return a wrapped version of file which provides transparent
				177	encoding translation.
				178
				179	Strings written to the wrapped file are interpreted according to the
				180	given \var{input} encoding and then written to the original file as
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	181	strings using the \var{output} encoding. The intermediate encoding will
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	182	usually be Unicode but depends on the specified codecs.
				183
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	184	If \var{output} is not given, it defaults to \var{input}.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	185
				186	\var{errors} may be given to define the error handling. It defaults to
Fred Drake	e1b304d	2000-07-24 19:35:52 +0000	[diff] [blame]	187	\code{'strict'}, which causes \exception{ValueError} to be raised in case
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	188	an encoding error occurs.
				189	\end{funcdesc}
				190
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	191	The module also provides the following constants which are useful
				192	for reading and writing to platform dependent files:
				193
				194	\begin{datadesc}{BOM}
				195	\dataline{BOM_BE}
				196	\dataline{BOM_LE}
Walter Dörwald	474458d	2002-06-04 15:16:29 +0000	[diff] [blame]	197	\dataline{BOM_UTF8}
				198	\dataline{BOM_UTF16}
				199	\dataline{BOM_UTF16_BE}
				200	\dataline{BOM_UTF16_LE}
				201	\dataline{BOM_UTF32}
				202	\dataline{BOM_UTF32_BE}
				203	\dataline{BOM_UTF32_LE}
				204	These constants define various encodings of the Unicode byte order mark
				205	(BOM) used in UTF-16 and UTF-32 data streams to indicate the byte order
				206	used in the stream or file and in UTF-8 as a Unicode signature.
				207	\constant{BOM_UTF16} is either \constant{BOM_UTF16_BE} or
				208	\constant{BOM_UTF16_LE} depending on the platform's native byte order,
				209	\constant{BOM} is an alias for \constant{BOM_UTF16}, \constant{BOM_LE}
				210	for \constant{BOM_UTF16_LE} and \constant{BOM_BE} for \constant{BOM_UTF16_BE}.
				211	The others represent the BOM in UTF-8 and UTF-32 encodings.
Fred Drake	b7979c7	2000-04-06 14:21:58 +0000	[diff] [blame]	212	\end{datadesc}
				213
Fred Drake	dc40ac0	2001-01-22 20:17:54 +0000	[diff] [blame]	214
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	215	\subsection{Codec Base Classes}
				216
				217	The \module{codecs} defines a set of base classes which define the
				218	interface and can also be used to easily write you own codecs for use
				219	in Python.
				220
				221	Each codec has to define four interfaces to make it usable as codec in
				222	Python: stateless encoder, stateless decoder, stream reader and stream
				223	writer. The stream reader and writers typically reuse the stateless
				224	encoder/decoder to implement the file protocols.
				225
				226	The \class{Codec} class defines the interface for stateless
				227	encoders/decoders.
				228
				229	To simplify and standardize error handling, the \method{encode()} and
				230	\method{decode()} methods may implement different error handling
				231	schemes by providing the \var{errors} string argument. The following
				232	string values are defined and implemented by all standard Python
				233	codecs:
				234
Fred Drake	dc40ac0	2001-01-22 20:17:54 +0000	[diff] [blame]	235	\begin{tableii}{l\|l}{code}{Value}{Meaning}
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	236	\lineii{'strict'}{Raise \exception{UnicodeError} (or a subclass);
Fred Drake	dc40ac0	2001-01-22 20:17:54 +0000	[diff] [blame]	237	this is the default.}
				238	\lineii{'ignore'}{Ignore the character and continue with the next.}
				239	\lineii{'replace'}{Replace with a suitable replacement character;
				240	Python will use the official U+FFFD REPLACEMENT
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	241	CHARACTER for the built-in Unicode codecs on
				242	decoding and '?' on encoding.}
				243	\lineii{'xmlcharrefreplace'}{Replace with the appropriate XML
				244	character reference (only for encoding).}
				245	\lineii{'backslashreplace'}{Replace with backslashed escape sequences
				246	(only for encoding).}
Fred Drake	dc40ac0	2001-01-22 20:17:54 +0000	[diff] [blame]	247	\end{tableii}
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	248
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	249	The set of allowed values can be extended via \method{register_error}.
				250
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	251
				252	\subsubsection{Codec Objects \label{codec-objects}}
				253
				254	The \class{Codec} class defines these methods which also define the
				255	function interfaces of the stateless encoder and decoder:
				256
				257	\begin{methoddesc}{encode}{input\optional{, errors}}
				258	Encodes the object \var{input} and returns a tuple (output object,
Skip Montanaro	6c7bc31	2002-04-16 15:12:10 +0000	[diff] [blame]	259	length consumed). While codecs are not restricted to use with Unicode, in
				260	a Unicode context, encoding converts a Unicode object to a plain string
				261	using a particular character set encoding (e.g., \code{cp1252} or
				262	\code{iso-8859-1}).
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	263
				264	\var{errors} defines the error handling to apply. It defaults to
				265	\code{'strict'} handling.
				266
				267	The method may not store state in the \class{Codec} instance. Use
				268	\class{StreamCodec} for codecs which have to keep state in order to
				269	make encoding/decoding efficient.
				270
				271	The encoder must be able to handle zero length input and return an
				272	empty object of the output object type in this situation.
				273	\end{methoddesc}
				274
				275	\begin{methoddesc}{decode}{input\optional{, errors}}
				276	Decodes the object \var{input} and returns a tuple (output object,
Skip Montanaro	6c7bc31	2002-04-16 15:12:10 +0000	[diff] [blame]	277	length consumed). In a Unicode context, decoding converts a plain string
				278	encoded using a particular character set encoding to a Unicode object.
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	279
				280	\var{input} must be an object which provides the \code{bf_getreadbuf}
				281	buffer slot. Python strings, buffer objects and memory mapped files
				282	are examples of objects providing this slot.
				283
				284	\var{errors} defines the error handling to apply. It defaults to
				285	\code{'strict'} handling.
				286
				287	The method may not store state in the \class{Codec} instance. Use
				288	\class{StreamCodec} for codecs which have to keep state in order to
				289	make encoding/decoding efficient.
				290
				291	The decoder must be able to handle zero length input and return an
				292	empty object of the output object type in this situation.
				293	\end{methoddesc}
				294
				295	The \class{StreamWriter} and \class{StreamReader} classes provide
				296	generic working interfaces which can be used to implement new
				297	encodings submodules very easily. See \module{encodings.utf_8} for an
				298	example on how this is done.
				299
				300
				301	\subsubsection{StreamWriter Objects \label{stream-writer-objects}}
				302
				303	The \class{StreamWriter} class is a subclass of \class{Codec} and
				304	defines the following methods which every stream writer must define in
				305	order to be compatible to the Python codec registry.
				306
				307	\begin{classdesc}{StreamWriter}{stream\optional{, errors}}
				308	Constructor for a \class{StreamWriter} instance.
				309
				310	All stream writers must provide this constructor interface. They are
				311	free to add additional keyword arguments, but only the ones defined
				312	here are used by the Python codec registry.
				313
				314	\var{stream} must be a file-like object open for writing (binary)
				315	data.
				316
				317	The \class{StreamWriter} may implement different error handling
				318	schemes by providing the \var{errors} keyword argument. These
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	319	parameters are predefined:
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	320
				321	\begin{itemize}
				322	\item \code{'strict'} Raise \exception{ValueError} (or a subclass);
				323	this is the default.
				324	\item \code{'ignore'} Ignore the character and continue with the next.
				325	\item \code{'replace'} Replace with a suitable replacement character
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	326	\item \code{'xmlcharrefreplace'} Replace with the appropriate XML
				327	character reference
				328	\item \code{'backslashreplace'} Replace with backslashed escape sequences.
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	329	\end{itemize}
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	330
				331	The \var{errors} argument will be assigned to an attribute of the
				332	same name. Assigning to this attribute makes it possible to switch
				333	between different error handling strategies during the lifetime
				334	of the \class{StreamWriter} object.
				335
				336	The set of allowed values for the \var{errors} argument can
				337	be extended with \function{register_error()}.
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	338	\end{classdesc}
				339
				340	\begin{methoddesc}{write}{object}
				341	Writes the object's contents encoded to the stream.
				342	\end{methoddesc}
				343
				344	\begin{methoddesc}{writelines}{list}
				345	Writes the concatenated list of strings to the stream (possibly by
				346	reusing the \method{write()} method).
				347	\end{methoddesc}
				348
				349	\begin{methoddesc}{reset}{}
				350	Flushes and resets the codec buffers used for keeping state.
				351
				352	Calling this method should ensure that the data on the output is put
				353	into a clean state, that allows appending of new fresh data without
				354	having to rescan the whole stream to recover state.
				355	\end{methoddesc}
				356
				357	In addition to the above methods, the \class{StreamWriter} must also
				358	inherit all other methods and attribute from the underlying stream.
				359
				360
				361	\subsubsection{StreamReader Objects \label{stream-reader-objects}}
				362
				363	The \class{StreamReader} class is a subclass of \class{Codec} and
				364	defines the following methods which every stream reader must define in
				365	order to be compatible to the Python codec registry.
				366
				367	\begin{classdesc}{StreamReader}{stream\optional{, errors}}
				368	Constructor for a \class{StreamReader} instance.
				369
				370	All stream readers must provide this constructor interface. They are
				371	free to add additional keyword arguments, but only the ones defined
				372	here are used by the Python codec registry.
				373
				374	\var{stream} must be a file-like object open for reading (binary)
				375	data.
				376
				377	The \class{StreamReader} may implement different error handling
				378	schemes by providing the \var{errors} keyword argument. These
				379	parameters are defined:
				380
				381	\begin{itemize}
				382	\item \code{'strict'} Raise \exception{ValueError} (or a subclass);
				383	this is the default.
				384	\item \code{'ignore'} Ignore the character and continue with the next.
				385	\item \code{'replace'} Replace with a suitable replacement character.
				386	\end{itemize}
Walter Dörwald	430b156	2002-11-07 22:33:17 +0000	[diff] [blame]	387
				388	The \var{errors} argument will be assigned to an attribute of the
				389	same name. Assigning to this attribute makes it possible to switch
				390	between different error handling strategies during the lifetime
				391	of the \class{StreamReader} object.
				392
				393	The set of allowed values for the \var{errors} argument can
				394	be extended with \function{register_error()}.
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	395	\end{classdesc}
				396
				397	\begin{methoddesc}{read}{\optional{size}}
				398	Decodes data from the stream and returns the resulting object.
				399
				400	\var{size} indicates the approximate maximum number of bytes to read
				401	from the stream for decoding purposes. The decoder can modify this
				402	setting as appropriate. The default value -1 indicates to read and
				403	decode as much as possible. \var{size} is intended to prevent having
				404	to decode huge files in one step.
				405
				406	The method should use a greedy read strategy meaning that it should
				407	read as much data as is allowed within the definition of the encoding
				408	and the given size, e.g. if optional encoding endings or state
				409	markers are available on the stream, these should be read too.
				410	\end{methoddesc}
				411
				412	\begin{methoddesc}{readline}{[size]}
				413	Read one line from the input stream and return the
				414	decoded data.
				415
Fred Drake	0aa811c	2001-10-20 04:24:09 +0000	[diff] [blame]	416	Unlike the \method{readlines()} method, this method inherits
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	417	the line breaking knowledge from the underlying stream's
				418	\method{readline()} method -- there is currently no support for line
				419	breaking using the codec decoder due to lack of line buffering.
				420	Sublcasses should however, if possible, try to implement this method
				421	using their own knowledge of line breaking.
				422
				423	\var{size}, if given, is passed as size argument to the stream's
				424	\method{readline()} method.
				425	\end{methoddesc}
				426
				427	\begin{methoddesc}{readlines}{[sizehint]}
				428	Read all lines available on the input stream and return them as list
				429	of lines.
				430
				431	Line breaks are implemented using the codec's decoder method and are
				432	included in the list entries.
				433
				434	\var{sizehint}, if given, is passed as \var{size} argument to the
				435	stream's \method{read()} method.
				436	\end{methoddesc}
				437
				438	\begin{methoddesc}{reset}{}
				439	Resets the codec buffers used for keeping state.
				440
				441	Note that no stream repositioning should take place. This method is
				442	primarily intended to be able to recover from decoding errors.
				443	\end{methoddesc}
				444
				445	In addition to the above methods, the \class{StreamReader} must also
				446	inherit all other methods and attribute from the underlying stream.
				447
				448	The next two base classes are included for convenience. They are not
				449	needed by the codec registry, but may provide useful in practice.
				450
				451
				452	\subsubsection{StreamReaderWriter Objects \label{stream-reader-writer}}
				453
				454	The \class{StreamReaderWriter} allows wrapping streams which work in
				455	both read and write modes.
				456
				457	The design is such that one can use the factory functions returned by
				458	the \function{lookup()} function to construct the instance.
				459
				460	\begin{classdesc}{StreamReaderWriter}{stream, Reader, Writer, errors}
				461	Creates a \class{StreamReaderWriter} instance.
				462	\var{stream} must be a file-like object.
				463	\var{Reader} and \var{Writer} must be factory functions or classes
				464	providing the \class{StreamReader} and \class{StreamWriter} interface
				465	resp.
				466	Error handling is done in the same way as defined for the
				467	stream readers and writers.
				468	\end{classdesc}
				469
				470	\class{StreamReaderWriter} instances define the combined interfaces of
				471	\class{StreamReader} and \class{StreamWriter} classes. They inherit
				472	all other methods and attribute from the underlying stream.
				473
				474
				475	\subsubsection{StreamRecoder Objects \label{stream-recoder-objects}}
				476
				477	The \class{StreamRecoder} provide a frontend - backend view of
				478	encoding data which is sometimes useful when dealing with different
				479	encoding environments.
				480
				481	The design is such that one can use the factory functions returned by
				482	the \function{lookup()} function to construct the instance.
				483
				484	\begin{classdesc}{StreamRecoder}{stream, encode, decode,
				485	Reader, Writer, errors}
				486	Creates a \class{StreamRecoder} instance which implements a two-way
				487	conversion: \var{encode} and \var{decode} work on the frontend (the
				488	input to \method{read()} and output of \method{write()}) while
				489	\var{Reader} and \var{Writer} work on the backend (reading and
				490	writing to the stream).
				491
				492	You can use these objects to do transparent direct recodings from
				493	e.g.\ Latin-1 to UTF-8 and back.
				494
				495	\var{stream} must be a file-like object.
				496
				497	\var{encode}, \var{decode} must adhere to the \class{Codec}
				498	interface, \var{Reader}, \var{Writer} must be factory functions or
Raymond Hettinger	f17d65d	2003-08-12 00:01:16 +0000	[diff] [blame]	499	classes providing objects of the \class{StreamReader} and
Fred Drake	602aa77	2000-10-12 20:50:55 +0000	[diff] [blame]	500	\class{StreamWriter} interface respectively.
				501
				502	\var{encode} and \var{decode} are needed for the frontend
				503	translation, \var{Reader} and \var{Writer} for the backend
				504	translation. The intermediate format used is determined by the two
				505	sets of codecs, e.g. the Unicode codecs will use Unicode as
				506	intermediate encoding.
				507
				508	Error handling is done in the same way as defined for the
				509	stream readers and writers.
				510	\end{classdesc}
				511
				512	\class{StreamRecoder} instances define the combined interfaces of
				513	\class{StreamReader} and \class{StreamWriter} classes. They inherit
				514	all other methods and attribute from the underlying stream.
				515
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	516	\subsection{Standard Encodings}
				517
				518	Python comes with a number of codecs builtin, either implemented as C
				519	functions, or with dictionaries as mapping tables. The following table
				520	lists the codecs by name, together with a few common aliases, and the
				521	languages for which the encoding is likely used. Neither the list of
				522	aliases nor the list of languages is meant to be exhaustive. Notice
				523	that spelling alternatives that only differ in case or use a hyphen
				524	instead of an underscore are also valid aliases.
				525
				526	Many of the character sets support the same languages. They vary in
				527	individual characters (e.g. whether the EURO SIGN is supported or
				528	not), and in the assignment of characters to code positions. For the
				529	European languages in particular, the following variants typically
				530	exist:
				531
				532	\begin{itemize}
				533	\item an ISO 8859 codeset
				534	\item a Microsoft Windows code page, which is typically derived from
				535	a 8859 codeset, but replaces control characters with additional
				536	graphic characters
				537	\item an IBM EBCDIC code page
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	538	\item an IBM PC code page, which is \ASCII{} compatible
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	539	\end{itemize}
				540
				541	\begin{longtableiii}{l\|l\|l}{textrm}{Codec}{Aliases}{Languages}
				542
				543	\lineiii{ascii}
				544	{646, us-ascii}
				545	{English}
				546
Hye-Shik Chang	3e2a306	2004-01-17 14:29:29 +0000	[diff] [blame]	547	\lineiii{big5}
				548	{big5_tw, csbig5}
				549	{Traditional Chinese}
				550
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	551	\lineiii{cp037}
				552	{IBM037, IBM039}
				553	{English}
				554
				555	\lineiii{cp424}
				556	{EBCDIC-CP-HE, IBM424}
				557	{Hebrew}
				558
				559	\lineiii{cp437}
				560	{437, IBM437}
				561	{English}
				562
				563	\lineiii{cp500}
				564	{EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500}
				565	{Western Europe}
				566
				567	\lineiii{cp737}
				568	{}
				569	{Greek}
				570
				571	\lineiii{cp775}
				572	{IBM775}
				573	{Baltic languages}
				574
				575	\lineiii{cp850}
				576	{850, IBM850}
				577	{Western Europe}
				578
				579	\lineiii{cp852}
				580	{852, IBM852}
				581	{Central and Eastern Europe}
				582
				583	\lineiii{cp855}
				584	{855, IBM855}
				585	{Bulgarian, Byelorussian, Macedonian, Russian, Serbian}
				586
				587	\lineiii{cp856}
				588	{}
				589	{Hebrew}
				590
				591	\lineiii{cp857}
				592	{857, IBM857}
				593	{Turkish}
				594
				595	\lineiii{cp860}
				596	{860, IBM860}
				597	{Portuguese}
				598
				599	\lineiii{cp861}
				600	{861, CP-IS, IBM861}
				601	{Icelandic}
				602
				603	\lineiii{cp862}
				604	{862, IBM862}
				605	{Hebrew}
				606
				607	\lineiii{cp863}
				608	{863, IBM863}
				609	{Canadian}
				610
				611	\lineiii{cp864}
				612	{IBM864}
				613	{Arabic}
				614
				615	\lineiii{cp865}
				616	{865, IBM865}
				617	{Danish, Norwegian}
				618
				619	\lineiii{cp869}
				620	{869, CP-GR, IBM869}
				621	{Greek}
				622
				623	\lineiii{cp874}
				624	{}
				625	{Thai}
				626
				627	\lineiii{cp875}
				628	{}
				629	{Greek}
				630
Hye-Shik Chang	3e2a306	2004-01-17 14:29:29 +0000	[diff] [blame]	631	\lineiii{cp932}
				632	{932, ms932, mskanji, ms_kanji}
				633	{Japanese}
				634
				635	\lineiii{cp949}
				636	{949, ms949, uhc}
				637	{Korean}
				638
				639	\lineiii{cp950}
				640	{950, ms950}
				641	{Traditional Chinese}
				642
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	643	\lineiii{cp1006}
				644	{}
				645	{Urdu}
				646
				647	\lineiii{cp1026}
				648	{ibm1026}
				649	{Turkish}
				650
				651	\lineiii{cp1140}
				652	{ibm1140}
				653	{Western Europe}
				654
				655	\lineiii{cp1250}
				656	{windows-1250}
				657	{Central and Eastern Europe}
				658
				659	\lineiii{cp1251}
				660	{windows-1251}
				661	{Bulgarian, Byelorussian, Macedonian, Russian, Serbian}
				662
				663	\lineiii{cp1252}
				664	{windows-1252}
				665	{Western Europe}
				666
				667	\lineiii{cp1253}
				668	{windows-1253}
				669	{Greek}
				670
				671	\lineiii{cp1254}
				672	{windows-1254}
				673	{Turkish}
				674
				675	\lineiii{cp1255}
				676	{windows-1255}
				677	{Hebrew}
				678
				679	\lineiii{cp1256}
				680	{windows1256}
				681	{Arabic}
				682
				683	\lineiii{cp1257}
				684	{windows-1257}
				685	{Baltic languages}
				686
				687	\lineiii{cp1258}
				688	{windows-1258}
				689	{Vietnamese}
				690
Hye-Shik Chang	3e2a306	2004-01-17 14:29:29 +0000	[diff] [blame]	691	\lineiii{euc_jp}
				692	{eucjp, ujis, u_jis}
				693	{Japanese}
				694
				695	\lineiii{euc_jisx0213}
				696	{jisx0213, eucjisx0213}
				697	{Japanese}
				698
				699	\lineiii{euc_kr}
				700	{euckr, korean, ksc5601, ks_c_5601, ks_c_5601_1987, ksx1001, ks_x_1001}
				701	{Korean}
				702
				703	\lineiii{gb2312}
				704	{chinese, csiso58gb231280, euc_cn, euccn, eucgb2312_cn, gb2312_1980,
				705	gb2312_80, iso_ir_58}
				706	{Simplified Chinese}
				707
				708	\lineiii{gbk}
				709	{936, cp936, ms936}
				710	{Unified Chinese}
				711
				712	\lineiii{gb18030}
				713	{gb18030_2000}
				714	{Unified Chinese}
				715
				716	\lineiii{hz}
				717	{hzgb, hz_gb, hz_gb_2312}
				718	{Simplified Chinese}
				719
				720	\lineiii{iso2022_jp}
				721	{csiso2022jp, iso2022jp, iso_2022_jp}
				722	{Japanese}
				723
				724	\lineiii{iso2022_jp_1}
				725	{iso2022jp_1, iso_2022_jp_1}
				726	{Japanese}
				727
				728	\lineiii{iso2022_jp_2}
				729	{iso2022jp_2, iso_2022_jp_2}
				730	{Japanese, Korean, Simplified Chinese, Western Europe, Greek}
				731
				732	\lineiii{iso2022_jp_3}
				733	{iso2022jp_3, iso_2022_jp_3}
				734	{Japanese}
				735
				736	\lineiii{iso2022_jp_ext}
				737	{iso2022jp_ext, iso_2022_jp_ext}
				738	{Japanese}
				739
				740	\lineiii{iso2022_kr}
				741	{csiso2022kr, iso2022kr, iso_2022_kr}
				742	{Korean}
				743
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	744	\lineiii{latin_1}
				745	{iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1}
				746	{West Europe}
				747
				748	\lineiii{iso8859_2}
				749	{iso-8859-2, latin2, L2}
				750	{Central and Eastern Europe}
				751
				752	\lineiii{iso8859_3}
				753	{iso-8859-3, latin3, L3}
				754	{Esperanto, Maltese}
				755
				756	\lineiii{iso8859_4}
				757	{iso-8859-4, latin4, L4}
				758	{Baltic languagues}
				759
				760	\lineiii{iso8859_5}
				761	{iso-8859-5, cyrillic}
				762	{Bulgarian, Byelorussian, Macedonian, Russian, Serbian}
				763
				764	\lineiii{iso8859_6}
				765	{iso-8859-6, arabic}
				766	{Arabic}
				767
				768	\lineiii{iso8859_7}
				769	{iso-8859-7, greek, greek8}
				770	{Greek}
				771
				772	\lineiii{iso8859_8}
				773	{iso-8859-8, hebrew}
				774	{Hebrew}
				775
				776	\lineiii{iso8859_9}
				777	{iso-8859-9, latin5, L5}
				778	{Turkish}
				779
				780	\lineiii{iso8859_10}
				781	{iso-8859-10, latin6, L6}
				782	{Nordic languages}
				783
				784	\lineiii{iso8859_13}
				785	{iso-8859-13}
				786	{Baltic languages}
				787
				788	\lineiii{iso8859_14}
				789	{iso-8859-14, latin8, L8}
				790	{Celtic languages}
				791
				792	\lineiii{iso8859_15}
				793	{iso-8859-15}
				794	{Western Europe}
				795
Hye-Shik Chang	3e2a306	2004-01-17 14:29:29 +0000	[diff] [blame]	796	\lineiii{johab}
				797	{cp1361, ms1361}
				798	{Korean}
				799
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	800	\lineiii{koi8_r}
				801	{}
				802	{Russian}
				803
				804	\lineiii{koi8_u}
				805	{}
				806	{Ukrainian}
				807
				808	\lineiii{mac_cyrillic}
				809	{maccyrillic}
				810	{Bulgarian, Byelorussian, Macedonian, Russian, Serbian}
				811
				812	\lineiii{mac_greek}
				813	{macgreek}
				814	{Greek}
				815
				816	\lineiii{mac_iceland}
				817	{maciceland}
				818	{Icelandic}
				819
				820	\lineiii{mac_latin2}
				821	{maclatin2, maccentraleurope}
				822	{Central and Eastern Europe}
				823
				824	\lineiii{mac_roman}
				825	{macroman}
				826	{Western Europe}
				827
				828	\lineiii{mac_turkish}
				829	{macturkish}
				830	{Turkish}
				831
Hye-Shik Chang	5c5316f	2004-03-19 08:06:07 +0000	[diff] [blame]	832	\lineiii{ptcp154}
				833	{csptcp154, pt154, cp154, cyrillic-asian}
				834	{Kazakh}
				835
Hye-Shik Chang	3e2a306	2004-01-17 14:29:29 +0000	[diff] [blame]	836	\lineiii{shift_jis}
				837	{csshiftjis, shiftjis, sjis, s_jis}
				838	{Japanese}
				839
				840	\lineiii{shift_jisx0213}
				841	{shiftjisx0213, sjisx0213, s_jisx0213}
				842	{Japanese}
				843
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	844	\lineiii{utf_16}
				845	{U16, utf16}
				846	{all languages}
				847
				848	\lineiii{utf_16_be}
				849	{UTF-16BE}
				850	{all languages (BMP only)}
				851
				852	\lineiii{utf_16_le}
				853	{UTF-16LE}
				854	{all languages (BMP only)}
				855
				856	\lineiii{utf_7}
				857	{U7}
				858	{all languages}
				859
				860	\lineiii{utf_8}
				861	{U8, UTF, utf8}
				862	{all languages}
				863
				864	\end{longtableiii}
				865
				866	A number of codecs are specific to Python, so their codec names have
				867	no meaning outside Python. Some of them don't convert from Unicode
				868	strings to byte strings, but instead use the property of the Python
				869	codecs machinery that any bijective function with one argument can be
				870	considered as an encoding.
				871
				872	For the codecs listed below, the result in the ``encoding'' direction
				873	is always a byte string. The result of the ``decoding'' direction is
				874	listed as operand type in the table.
				875
				876	\begin{tableiv}{l\|l\|l\|l}{textrm}{Codec}{Aliases}{Operand type}{Purpose}
				877
				878	\lineiv{base64_codec}
				879	{base64, base-64}
				880	{byte string}
				881	{Convert operand to MIME base64}
				882
Raymond Hettinger	9a80c5d	2003-09-23 20:21:01 +0000	[diff] [blame]	883	\lineiv{bz2_codec}
				884	{bz2}
				885	{byte string}
				886	{Compress the operand using bz2}
				887
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	888	\lineiv{hex_codec}
				889	{hex}
				890	{byte string}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	891	{Convert operand to hexadecimal representation, with two
				892	digits per byte}
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	893
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	894	\lineiv{idna}
				895	{}
				896	{Unicode string}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	897	{Implements \rfc{3490}.
Raymond Hettinger	aa1178b	2003-09-01 23:13:04 +0000	[diff] [blame]	898	\versionadded{2.3}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	899	See also \refmodule{encodings.idna}}
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	900
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	901	\lineiv{mbcs}
				902	{dbcs}
				903	{Unicode string}
				904	{Windows only: Encode operand according to the ANSI codepage (CP_ACP)}
				905
				906	\lineiv{palmos}
				907	{}
				908	{Unicode string}
				909	{Encoding of PalmOS 3.5}
				910
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	911	\lineiv{punycode}
				912	{}
				913	{Unicode string}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	914	{Implements \rfc{3492}.
				915	\versionadded{2.3}}
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	916
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	917	\lineiv{quopri_codec}
				918	{quopri, quoted-printable, quotedprintable}
				919	{byte string}
				920	{Convert operand to MIME quoted printable}
				921
				922	\lineiv{raw_unicode_escape}
				923	{}
				924	{Unicode string}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	925	{Produce a string that is suitable as raw Unicode literal in
				926	Python source code}
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	927
				928	\lineiv{rot_13}
				929	{rot13}
				930	{byte string}
				931	{Returns the Caesar-cypher encryption of the operand}
				932
				933	\lineiv{string_escape}
				934	{}
				935	{byte string}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	936	{Produce a string that is suitable as string literal in
				937	Python source code}
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	938
				939	\lineiv{undefined}
				940	{}
				941	{any}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	942	{Raise an exception for all conversion. Can be used as the
				943	system encoding if no automatic coercion between byte and
				944	Unicode strings is desired.}
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	945
				946	\lineiv{unicode_escape}
				947	{}
				948	{Unicode string}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	949	{Produce a string that is suitable as Unicode literal in
				950	Python source code}
Martin v. Löwis	5c37a77	2002-12-31 12:39:07 +0000	[diff] [blame]	951
				952	\lineiv{unicode_internal}
				953	{}
				954	{Unicode string}
				955	{Return the internal represenation of the operand}
				956
				957	\lineiv{uu_codec}
				958	{uu}
				959	{byte string}
				960	{Convert the operand using uuencode}
				961
				962	\lineiv{zlib_codec}
				963	{zip, zlib}
				964	{byte string}
				965	{Compress the operand using gzip}
				966
				967	\end{tableiv}
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	968
				969	\subsection{\module{encodings.idna} ---
				970	Internationalized Domain Names in Applications}
				971
				972	\declaremodule{standard}{encodings.idna}
				973	\modulesynopsis{Internationalized Domain Names implementation}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	974	% XXX The next line triggers a formatting bug, so it's commented out
				975	% until that can be fixed.
				976	%\moduleauthor{Martin v. L\"owis}
				977
				978	\versionadded{2.3}
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	979
				980	This module implements \rfc{3490} (Internationalized Domain Names in
				981	Applications) and \rfc{3492} (Nameprep: A Stringprep Profile for
				982	Internationalized Domain Names (IDN)). It builds upon the
Fred Drake	d24c767	2003-07-16 05:17:23 +0000	[diff] [blame]	983	\code{punycode} encoding and \refmodule{stringprep}.
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	984
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	985	These RFCs together define a protocol to support non-\ASCII{} characters
				986	in domain names. A domain name containing non-\ASCII{} characters (such
Fred Drake	d24c767	2003-07-16 05:17:23 +0000	[diff] [blame]	987	as ``www.Alliancefran\c caise.nu'') is converted into an
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	988	\ASCII-compatible encoding (ACE, such as
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	989	``www.xn--alliancefranaise-npb.nu''). The ACE form of the domain name
				990	is then used in all places where arbitrary characters are not allowed
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	991	by the protocol, such as DNS queries, HTTP \mailheader{Host} fields, and so
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	992	on. This conversion is carried out in the application; if possible
				993	invisible to the user: The application should transparently convert
				994	Unicode domain labels to IDNA on the wire, and convert back ACE labels
				995	to Unicode before presenting them to the user.
				996
				997	Python supports this conversion in several ways: The \code{idna} codec
				998	allows to convert between Unicode and the ACE. Furthermore, the
Fred Drake	d24c767	2003-07-16 05:17:23 +0000	[diff] [blame]	999	\refmodule{socket} module transparently converts Unicode host names to
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	1000	ACE, so that applications need not be concerned about converting host
				1001	names themselves when they pass them to the socket module. On top of
				1002	that, modules that have host names as function parameters, such as
Fred Drake	d24c767	2003-07-16 05:17:23 +0000	[diff] [blame]	1003	\refmodule{httplib} and \refmodule{ftplib}, accept Unicode host names
				1004	(\refmodule{httplib} then also transparently sends an IDNA hostname in
				1005	the \mailheader{Host} field if it sends that field at all).
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	1006
				1007	When receiving host names from the wire (such as in reverse name
				1008	lookup), no automatic conversion to Unicode is performed: Applications
				1009	wishing to present such host names to the user should decode them to
				1010	Unicode.
				1011
				1012	The module \module{encodings.idna} also implements the nameprep
				1013	procedure, which performs certain normalizations on host names, to
				1014	achieve case-insensitivity of international domain names, and to unify
				1015	similar characters. The nameprep functions can be used directly if
				1016	desired.
				1017
				1018	\begin{funcdesc}{nameprep}{label}
				1019	Return the nameprepped version of \var{label}. The implementation
				1020	currently assumes query strings, so \code{AllowUnassigned} is
				1021	true.
				1022	\end{funcdesc}
				1023
Raymond Hettinger	b5155e3	2003-06-18 01:58:31 +0000	[diff] [blame]	1024	\begin{funcdesc}{ToASCII}{label}
Fred Drake	d4be747	2003-04-30 15:02:07 +0000	[diff] [blame]	1025	Convert a label to \ASCII, as specified in \rfc{3490}.
Martin v. Löwis	2548c73	2003-04-18 10:39:54 +0000	[diff] [blame]	1026	\code{UseSTD3ASCIIRules} is assumed to be false.
				1027	\end{funcdesc}
				1028
				1029	\begin{funcdesc}{ToUnicode}{label}
				1030	Convert a label to Unicode, as specified in \rfc{3490}.
				1031	\end{funcdesc}